Backdoor Attacks Against Deep Learning Systems in the Physical World

Published 25 Jun 2020 in cs.CV, cs.CR, and cs.LG | (2006.14580v4)

Abstract: Backdoor attacks embed hidden malicious behaviors into deep learning models, which only activate and cause misclassifications on model inputs containing a specific trigger. Existing works on backdoor attacks and defenses, however, mostly focus on digital attacks that use digitally generated patterns as triggers. A critical question remains unanswered: can backdoor attacks succeed using physical objects as triggers, thus making them a credible threat against deep learning systems in the real world? We conduct a detailed empirical study to explore this question for facial recognition, a critical deep learning task. Using seven physical objects as triggers, we collect a custom dataset of 3205 images of ten volunteers and use it to study the feasibility of physical backdoor attacks under a variety of real-world conditions. Our study reveals two key findings. First, physical backdoor attacks can be highly successful if they are carefully configured to overcome the constraints imposed by physical objects. In particular, the placement of successful triggers is largely constrained by the target model's dependence on key facial features. Second, four of today's state-of-the-art defenses against (digital) backdoors are ineffective against physical backdoors, because the use of physical objects breaks core assumptions used to construct these defenses. Our study confirms that (physical) backdoor attacks are not a hypothetical phenomenon but rather pose a serious real-world threat to critical classification tasks. We need new and more robust defenses against backdoors in the physical world.

Abstract PDF Upgrade to Chat

Citations (174)

View on Semantic Scholar

Summary

The paper demonstrates that physical backdoor attacks can achieve up to a 90% success rate on models like VGG16, ResNet50, and DenseNet with 15-25% poisoning.
The research evaluates backdoor attacks under real-world conditions using physical trigger placement on facial features and exposes the shortcomings of digital-based defenses.
The study underscores the need for robust, physically-aware mitigation strategies to reduce false positives and secure AI systems in practical scenarios.

Overview of Backdoor Attacks Against Deep Learning Systems in the Physical World

The paper "Backdoor Attacks Against Deep Learning Systems in the Physical World" explores the practical implications of backdoor attacks, specifically using physical objects as triggers, on deep learning systems. The study centers around the question of whether backdoor attacks using physical triggers pose a credible threat to models deployed in real-world applications such as facial recognition.

Key Contributions

The paper makes several notable contributions to the understanding of physical backdoors in deep learning:

Demonstration of Physical Backdoor Viability: The study empirically demonstrates that backdoor attacks can be effectively executed using physical trigger objects. This includes a detailed examination across various model architectures such as VGG16, ResNet50, and DenseNet. The study shows that with a poisoning level between 15-25%, physical backdoor attacks can achieve a 90% success rate while maintaining high model accuracy on clean, benign inputs.
Evaluation of Real-World Conditions: The paper describes a thorough empirical analysis of backdoor attacks using a custom dataset of 3205 images collected from 10 volunteers, featuring common objects as triggers. These experiments simulate real-world settings and run-time image artifacts, such as blurring, compression, and noise, to validate the robustness of the attacks across various conditions.
Investigation into Physical Trigger Efficacy: The research identifies that attack efficacy is highly contingent on the positioning of triggers on critical facial features. It was found that triggers need to be on the face to ensure a high attack success rate, as observed with ineffective triggers like earrings due to their off-face placement.
Ineffectiveness of Current Defenses: The paper tests the resilience of state-of-the-art digital backdoor defenses against physical triggers. Notably, the defenses such as Neural Cleanse, Spectral Signatures, Activation Clustering, and STRIP fail significantly when faced with physical backdoors due to their inherent assumptions, which do not hold in a physical context.
Potential Mitigation Strategies: Additionally, the paper provides preliminary strategies to minimize false positives triggered by similar, non-malicious objects and suggests an approach for attackers to reduce detection through false positive training.

Implications and Future Considerations

The implications of this research are profound for both practical deployment and theoretical understanding of AI systems:

Practical Deployments: The findings underscore the need for more robust defensive strategies specifically tailored to address physical backdoor threats, which may not be detected by existing digital-focused methods.
Theoretical Development: This research bridges a significant gap in current literature concerning the understanding of adversarial threats in the physical domain, indicating a need for a paradigm shift in how backdoor attacks are approached.
Future Directions in AI: Future research may focus on developing new defensive mechanisms that do not rely on assumptions invalid in physical environments and exploring broader applications and other domains subject to such vulnerabilities.

In summary, this paper presents a comprehensive exploration of the threat posed by physical backdoor attacks, highlighting their efficacy and the limitations of current defenses. It calls for a reevaluation of defense strategies to ensure the security of AI systems deployed in sensitive real-world applications.