Where and How to Attack? A Causality-Inspired Recipe for Generating Counterfactual Adversarial Examples
Abstract: Deep neural networks (DNNs) have been demonstrated to be vulnerable to well-crafted \emph{adversarial examples}, which are generated through either well-conceived $\mathcal{L}_p$-norm restricted or unrestricted attacks. Nevertheless, the majority of those approaches assume that adversaries can modify any features as they wish, and neglect the causal generating process of the data, which is unreasonable and unpractical. For instance, a modification in income would inevitably impact features like the debt-to-income ratio within a banking system. By considering the underappreciated causal generating process, first, we pinpoint the source of the vulnerability of DNNs via the lens of causality, then give theoretical results to answer \emph{where to attack}. Second, considering the consequences of the attack interventions on the current state of the examples to generate more realistic adversarial examples, we propose CADE, a framework that can generate \textbf{C}ounterfactual \textbf{AD}versarial \textbf{E}xamples to answer \emph{how to attack}. The empirical results demonstrate CADE's effectiveness, as evidenced by its competitive performance across diverse attack scenarios, including white-box, transfer-based, and random intervention attacks.
- Invariant Risk Minimization. arXiv:1907.02893.
- Bandits with Unobserved Confounders: A Causal Approach. In Advances in Neural Information Processing Systems, 1342–1350.
- Unrestricted Adversarial Examples via Semantic Manipulation. In 8th International Conference on Learning Representations, ICLR. OpenReview.net.
- Evasion Attacks against Machine Learning at Test Time. In ECML-PKDD, volume 8190 of Lecture Notes in Computer Science, 387–402. Springer.
- Adversarial Patch. arXiv:1712.09665.
- Woulda, Coulda, Shoulda: Counterfactually-Guided Policy Search. In 7th International Conference on Learning Representations, ICLR 2019. OpenReview.net.
- Learning Disentangled Semantic Representation for Domain Adaptation. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence.
- Self: structural equational likelihood framework for causal discovery. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32.
- Towards Evaluating the Robustness of Neural Networks. In 2017 IEEE Symposium on Security and Privacy, SP, 39–57. IEEE Computer Society.
- StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, 8789–8797. Computer Vision Foundation / IEEE Computer Society.
- StarGAN v2: Diverse Image Synthesis for Multiple Domains. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, 8185–8194. Computer Vision Foundation / IEEE.
- Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, volume 119 of Proceedings of Machine Learning Research, 2206–2216. PMLR.
- Mind the Box: l11{}_{\mbox{1}}start_FLOATSUBSCRIPT 1 end_FLOATSUBSCRIPT-APGD for Sparse Adversarial Attacks on Image Classifiers. In Proceedings of the 38th International Conference on Machine Learning, ICML 2021, volume 139 of Proceedings of Machine Learning Research, 2201–2211. PMLR.
- A study of the effect of JPG compression on adversarial images. arXiv:1608.00853.
- Generative Adversarial Networks. arXiv:1406.2661.
- Explaining and Harnessing Adversarial Examples. In 3rd International Conference on Learning Representations, ICLR.
- Countering Adversarial Images using Input Transformations. In 6th International Conference on Learning Representations, ICLR 2018. OpenReview.net.
- Deep Residual Learning for Image Recognition. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, 770–778. IEEE Computer Society.
- Denoising Diffusion Probabilistic Models. In Advances in Neural Information Processing Systems, NeurIPS 2020.
- Semantic Adversarial Examples. In 2018 IEEE Conference on Computer Vision and Pattern Recognition Workshops, 1614–1619. Computer Vision Foundation / IEEE Computer Society.
- Adversarial Examples Are Not Bugs, They Are Features. In Advances in Neural Information Processing Systems, NeurIPS 2019, 125–136.
- Auto-Encoding Variational Bayes. In 2nd International Conference on Learning Representations, ICLR 2014.
- CausalGAN: Learning Causal Implicit Generative Models with Adversarial Training. In 6th International Conference on Learning Representations, ICLR 2018. OpenReview.net.
- Adversarial examples in the physical world. In 5th International Conference on Learning Representations, ICLR 2017. OpenReview.net.
- Deep Learning Face Attributes in the Wild. In Proceedings of International Conference on Computer Vision (ICCV).
- Towards Deep Learning Models Resistant to Adversarial Attacks. In 6th International Conference on Learning Representations, ICLR. OpenReview.net.
- DeepFool: A Simple and Accurate Method to Fool Deep Neural Networks. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR, 2574–2582. IEEE Computer Society.
- Pearl, J. 2009. Causality: Models, Reasoning and Inference. USA: Cambridge University Press, 2nd edition. ISBN 052189560X.
- The Book of Why: The New Science of Cause and Effect. USA: Basic Books, Inc., 1st edition. ISBN 046509760X.
- Self-Paced Contrastive Learning for Semi-supervised Medical Image Segmentation with Meta-labels. In Advances in Neural Information Processing Systems, NeurIPS 2021, 16686–16699.
- Structural Hawkes Processes for Learning Causal Structure from Discrete-Time Event Sequences. In Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, IJCAI-23, 5702–5710. IJCAI Organization.
- SemanticAdv: Generating Adversarial Examples via Attribute-Conditioned Image Editing. In Computer Vision - ECCV 2020 - 16th European Conference, volume 12359, 19–37. Springer.
- Towards Causal Representation Learning. arXiv:2102.11107.
- FaceNet: A unified embedding for face recognition and clustering. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA, June 7-12, 2015, 815–823. IEEE Computer Society.
- Accessorize to a Crime: Real and Stealthy Attacks on State-of-the-Art Face Recognition. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, 1528–1540. Association for Computing Machinery. ISBN 9781450341394.
- Weakly Supervised Disentangled Generative Causal Representation Learning. J. Mach. Learn. Res., 23: 241:1–241:55.
- Very Deep Convolutional Networks for Large-Scale Image Recognition. In 3rd International Conference on Learning Representations, ICLR 2015.
- Constructing Unrestricted Adversarial Examples with Generative Models. In Advances in Neural Information Processing Systems, NeurIPS 2018, 8322–8333.
- Intriguing properties of neural networks. In 2nd International Conference on Learning Representations, ICLR.
- Provably Invariant Learning without Domain Information. In ICML 2023, volume 202 of Proceedings of Machine Learning Research, 33563–33580. PMLR.
- DeepTest: automated testing of deep-neural-network-driven autonomous cars. In Proceedings of the 40th International Conference on Software Engineering, ICSE 2018, 303–314. ACM.
- Ensemble Adversarial Training: Attacks and Defenses. In 6th International Conference on Learning Representations, ICLR 2018. OpenReview.net.
- Feature Denoising for Improving Adversarial Robustness. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, 501–509. Computer Vision Foundation / IEEE.
- Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks. In NDSS 2018. The Internet Society.
- CausalVAE: Disentangled Representation Learning via Neural Structural Causal Models. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, 9593–9602. Computer Vision Foundation / IEEE.
- DAG-GNN: DAG Structure Learning with Graph Neural Networks. In ICML 2019, volume 97, 7154–7163. PMLR.
- Natural Color Fool: Towards Boosting Black-box Unrestricted Attacks. In NeurIPS.
- A Causal View on Robustness of Neural Networks. In NeurIPS 2020.
- Domain Adaptation under Target and Conditional Shift. In Proceedings of the 30th International Conference on Machine Learning, ICML 2013, volume 28 of JMLR Workshop and Conference Proceedings, 819–827. JMLR.org.
- Adversarial Robustness Through the Lens of Causality. In The Tenth International Conference on Learning Representations, ICLR 2022. OpenReview.net.
- On Learning Invariant Representations for Domain Adaptation. In Proceedings of the 36th International Conference on Machine Learning, ICML 2019, volume 97 of Proceedings of Machine Learning Research, 7523–7532. PMLR.
- Generating Natural Adversarial Examples. In 6th International Conference on Learning Representations, ICLR 2018. OpenReview.net.
- Adversarial Color Enhancement: Generating Unrestricted Adversarial Images by Optimizing a Color Filter. In 31st British Machine Vision Conference 2020, BMVC. BMVA Press.
- DAGs with NO TEARS: Continuous Optimization for Structure Learning. In Advances in Neural Information Processing Systems, NeurIPS 2018, 9492–9503.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.