Revealing Vulnerabilities of Neural Networks in Parameter Learning and Defense Against Explanation-Aware Backdoors
Abstract: Explainable Artificial Intelligence (XAI) strategies play a crucial part in increasing the understanding and trustworthiness of neural networks. Nonetheless, these techniques could potentially generate misleading explanations. Blinding attacks can drastically alter a machine learning algorithm's prediction and explanation, providing misleading information by adding visually unnoticeable artifacts into the input, while maintaining the model's accuracy. It poses a serious challenge in ensuring the reliability of XAI methods. To ensure the reliability of XAI methods poses a real challenge, we leverage statistical analysis to highlight the changes in CNN weights within a CNN following blinding attacks. We introduce a method specifically designed to limit the effectiveness of such attacks during the evaluation phase, avoiding the need for extra training. The method we suggest defences against most modern explanation-aware adversarial attacks, achieving an approximate decrease of ~99\% in the Attack Success Rate (ASR) and a ~91\% reduction in the Mean Square Error (MSE) between the original explanation and the defended (post-attack) explanation across three unique types of attacks.
- From attribution maps to human-understandable explanations through concept relevance propagation. Nature Machine Intelligence, 5(9):1006–1019, 2023.
- Sanity checks for saliency maps. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, page 9525–9536, Red Hook, NY, USA, 2018. Curran Associates Inc.
- Abien Fred Agarap. Deep learning using rectified linear units (relu). arXiv preprint arXiv:1803.08375, 2018.
- Fairwashing explanations with off-manifold detergent. In Proceedings of the 37th International Conference on Machine Learning, pages 314–323. PMLR, 2020.
- On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PloS one, 10(7):e0130140, 2015.
- How to explain individual classification decisions. The Journal of Machine Learning Research, 11:1803–1831, 2010.
- Adversarial attacks and defenses in explainable artificial intelligence: A survey, 2023.
- Batch normalization increases adversarial vulnerability and decreases adversarial transferability: A non-robust feature perspective. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages 7798–7807, 2021.
- Shortcomings of top-down randomization-based sanity checks for evaluations of deep neural network explanations. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 16143–16152, Los Alamitos, CA, USA, 2023. IEEE Computer Society.
- Robust attribution regularization. Advances in Neural Information Processing Systems, 32, 2019.
- Deconfounded representation similarity for comparison of neural networks. In Advances in Neural Information Processing Systems, pages 19138–19151. Curran Associates, Inc., 2022.
- Februus: Input purification defense against trojan attacks on deep neural network systems. In Annual computer security applications conference, pages 897–912, 2020.
- Explanations can be manipulated and geometry is to blame. Advances in neural information processing systems, 32, 2019.
- Towards robust explanations for deep neural networks. Pattern Recognition, 121:108194, 2022.
- When explainability meets adversarial learning: Detecting adversarial examples using shap signatures. In 2020 international joint conference on neural networks (IJCNN), pages 1–8. IEEE, 2020.
- ”is your explanation stable?”: A robustness evaluation framework for feature attribution. In Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security, page 1157–1171, New York, NY, USA, 2022. Association for Computing Machinery.
- Fooling neural network interpretations via adversarial model manipulation. In Advances in Neural Information Processing Systems. Curran Associates, Inc., 2019.
- Safari: Versatile and efficient evaluations for robustness of interpretability. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 1988–1998, 2023.
- Yerlan Idelbayev. Proper ResNet implementation for CIFAR10/CIFAR100 in PyTorch. https://github.com/akamaster/pytorch_resnet_cifar10. Accessed: 2023-08-06.
- Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37, page 448–456. JMLR.org, 2015.
- Towards more robust interpretation via local gradient alignment. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 8168–8176, 2023.
- Similarity of neural network representations revisited. In International conference on machine learning, pages 3519–3529. PMLR, 2019.
- Robust and stable black box explanations. In International Conference on Machine Learning, pages 5628–5638. PMLR, 2020.
- Unmasking clever hans predictors and assessing what machines really learn. Nature communications, 10(1):1096, 2019.
- Relevance-cam: Your model already knows where to look. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14944–14953, 2021.
- Deepshap summary for adversarial example detection. In 2023 IEEE/ACM International Workshop on Deep Learning for Testing and Testing for Deep Learning (DeepTest), pages 17–24. IEEE, 2023.
- Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2020.
- Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions. arXiv preprint arXiv:2310.19775, 2023.
- Explicit bias discovery in visual question answering models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9562–9571, 2019.
- Sok: Explainable machine learning in adversarial environments. In 2024 IEEE Symposium on Security and Privacy (SP), pages 21–21. IEEE Computer Society, 2023.
- Disguising attacks with explanation-aware backdoors. In 2023 IEEE Symposium on Security and Privacy (SP), pages 664–681. IEEE, 2023.
- A simple defense against adversarial attacks on heatmap explanations. In 5th Annual Workshop on Human Interpretability in Machine Learning, 2020.
- Explainable AI: Interpreting, Explaining and Visualizing Deep Learning. Springer International Publishing, 2019.
- Explaining deep neural networks and beyond: A review of methods and applications. Proceedings of the IEEE, 109(3):247–278, 2021.
- Grad-cam: Visual explanations from deep networks via gradient-based localization. In 2017 IEEE International Conference on Computer Vision (ICCV), pages 618–626, 2017.
- Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
- Man vs. computer: Benchmarking machine learning algorithms for traffic sign recognition. Neural Networks, (0):–, 2012.
- Defense against explanation manipulation. Frontiers in big Data, 5:704203, 2022.
- Defending against patch-based backdoor attacks on self-supervised learning. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 12239–12249, Los Alamitos, CA, USA, 2023. IEEE Computer Society.
- Preventing deception with explanation methods using focused sampling. Data Mining and Knowledge Discovery, pages 1–46, 2022.
- Removing batch normalization boosts adversarial training. In International Conference on Machine Learning, pages 23433–23445, 2022.
- Robust explanation constraints for neural networks. In The Eleventh International Conference on Learning Representations, 2022.
- Jerrold H. Zar. Spearman Rank Correlation. John Wiley & Sons, Ltd, 2005.
- Visualizing and understanding convolutional networks. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part I 13, pages 818–833. Springer, 2014.
- Improving deep neural networks using softplus units. In 2015 International joint conference on neural networks (IJCNN), pages 1–4. IEEE, 2015.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.