Adversarial Feature Map Pruning for Backdoor
Abstract: Deep neural networks have been widely used in many critical applications, such as autonomous vehicles and medical diagnosis. However, their security is threatened by backdoor attacks, which are achieved by adding artificial patterns to specific training data. Existing defense strategies primarily focus on using reverse engineering to reproduce the backdoor trigger generated by attackers and subsequently repair the DNN model by adding the trigger into inputs and fine-tuning the model with ground-truth labels. However, once the trigger generated by the attackers is complex and invisible, the defender cannot reproduce the trigger successfully then the DNN model will not be repaired, as the trigger is not effectively removed. In this work, we propose Adversarial Feature Map Pruning for Backdoor (FMP) to mitigate backdoor from the DNN. Unlike existing defense strategies, which focus on reproducing backdoor triggers, FMP attempts to prune backdoor feature maps, which are trained to extract backdoor information from inputs. After pruning these backdoor feature maps, FMP will fine-tune the model with a secure subset of training data. Our experiments demonstrate that, compared to existing defense strategies, FMP can effectively reduce the Attack Success Rate (ASR) even against the most complex and invisible attack triggers (e.g., FMP decreases the ASR to 2.86\% in CIFAR10, which is 19.2\% to 65.41\% lower than baselines). Second, unlike conventional defense methods that tend to exhibit low robust accuracy (that is, the accuracy of the model on poisoned data), FMP achieves a higher RA, indicating its superiority in maintaining model performance while mitigating the effects of backdoor attacks (e.g., FMP obtains 87.40\% RA in CIFAR10). Our code is publicly available at: https://github.com/retsuh-bqw/FMP.
- Inc. Amazon Web Services. Amazon web services (aws), 2023. URL https://aws.amazon.com/. Accessed: 2023-05-02.
- A new backdoor attack in cnns by training set corruption without label poisoning. 2019 IEEE International Conference on Image Processing (ICIP), pp. 101–105, 2019.
- End to end learning for self-driving cars. ArXiv, abs/1604.07316, 2016.
- Detecting backdoor attacks on deep neural networks by activation clustering. ArXiv, abs/1811.03728, 2018.
- Deepinspect: A black-box trojan detection and mitigation framework for deep neural networks. In International Joint Conference on Artificial Intelligence, 2019. URL https://api.semanticscholar.org/CorpusID:199466093.
- Effective backdoor defense by exploiting sensitivity of poisoned samples. Advances in Neural Information Processing Systems, 35:9727–9737, 2022.
- Targeted backdoor attacks on deep learning systems using data poisoning. ArXiv, abs/1712.05526, 2017.
- DataTurks. Dataturks: Data annotations made super easy, 2023. URL https://dataturks.com/. Accessed: 2023-05-02.
- Bert: Pre-training of deep bidirectional transformers for language understanding. ArXiv, abs/1810.04805, 2019.
- Hugging Face. Hugging face: The ai community building the future, 2023. URL https://huggingface.co/. Accessed: 2023-05-02.
- Detecting backdoors in neural networks using novel feature-based anomaly detection. ArXiv, abs/2011.02526, 2020.
- Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014.
- Google. Google cloud, 2023. URL https://cloud.google.com/. Accessed: 2023-05-02.
- Badnets: Identifying vulnerabilities in the machine learning model supply chain. ArXiv, abs/1708.06733, 2017.
- Aeva: Black-box backdoor detection using adversarial extreme value analysis. ArXiv, abs/2110.14880, 2021. URL https://api.semanticscholar.org/CorpusID:240070408.
- Tabor: A highly accurate approach to inspecting and restoring trojan backdoors in ai systems. ArXiv, abs/1908.01763, 2019. URL https://api.semanticscholar.org/CorpusID:199452956.
- Deep residual learning for image recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778, 2015.
- Backdoor defense via decoupling the training process. ArXiv, abs/2202.03423, 2022.
- Ltd. Huawei Technologies Co. Huawei cloud, 2023. URL https://www.huaweicloud.com/. Accessed: 2023-05-02.
- Cifar-10 and cifar100 (canadian institute for advanced research), 2009. URL https://www.cs.toronto.edu/~kriz/cifar.html. Accessed: 2023-05-02.
- Neural attention distillation: Erasing backdoor triggers from deep neural networks. ArXiv, abs/2101.05930, 2021a.
- Anti-backdoor learning: Training clean models on poisoned data. In Neural Information Processing Systems, 2021b.
- Reconstructive neuron pruning for backdoor defense. In International Conference on Machine Learning, 2023. URL https://api.semanticscholar.org/CorpusID:258865980.
- Backdoor learning: A survey. IEEE transactions on neural networks and learning systems, PP, 2020a.
- Invisible backdoor attack with sample-specific triggers. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 16443–16452, 2020b.
- Fine-pruning: Defending against backdooring attacks on deep neural networks. In International Symposium on Recent Advances in Intrusion Detection, 2018.
- Abs: Scanning neural networks for back-doors by artificial brain stimulation. Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security, 2019. URL https://api.semanticscholar.org/CorpusID:204746801.
- A. Nguyen and A. Tran. Wanet - imperceptible warping-based backdoor attack. ArXiv, abs/2102.10369, 2021.
- Specaugment: A simple data augmentation method for automatic speech recognition. ArXiv, abs/1904.08779, 2019.
- Chexnet: Radiologist-level pneumonia detection on chest x-rays with deep learning. ArXiv, abs/1711.05225, 2017.
- Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in Neural Information Processing Systems, pp. 91–99, 2015.
- Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115:211–252, 2014.
- The german traffic sign recognition benchmark: A multi-class classification competition. In Proceedings of the IEEE International Joint Conference on Neural Networks (IJCNN), 2012. URL http://benchmark.ini.rub.de/?section=gtsrb&subsection=dataset. Accessed: 2023-05-02.
- Spectral signatures in backdoor attacks. In Neural Information Processing Systems, 2018.
- Label-consistent backdoor attacks. ArXiv, abs/1912.02771, 2019.
- Neural cleanse: Identifying and mitigating backdoor attacks in neural networks. 2019 IEEE Symposium on Security and Privacy (SP), pp. 707–723, 2019a.
- Neural cleanse: Identifying and mitigating backdoor attacks in neural networks. 2019 IEEE Symposium on Security and Privacy (SP), pp. 707–723, 2019b.
- Backdoorbench: A comprehensive benchmark of backdoor learning. ArXiv, abs/2206.12654, 2022.
- Adversarial neuron pruning purifies backdoored deep models. ArXiv, abs/2110.14430, 2021.
- Visualizing and understanding convolutional networks. In European Conference on Computer Vision, 2013.
- Rethinking the backdoor attacks’ triggers: A frequency perspective. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 16453–16461, 2021.
- Clpa: Clean-label poisoning availability attacks using generative adversarial nets. In AAAI Conference on Artificial Intelligence, 2022.
- Data-free backdoor removal based on channel lipschitzness. In European Conference on Computer Vision, 2022a.
- Pre-activation distributions expose backdoor neurons. Advances in Neural Information Processing Systems, 35:18667–18680, 2022b.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.