Papers
Topics
Authors
Recent
Search
2000 character limit reached

Where and How to Attack? A Causality-Inspired Recipe for Generating Counterfactual Adversarial Examples

Published 21 Dec 2023 in cs.LG | (2312.13628v2)

Abstract: Deep neural networks (DNNs) have been demonstrated to be vulnerable to well-crafted \emph{adversarial examples}, which are generated through either well-conceived $\mathcal{L}_p$-norm restricted or unrestricted attacks. Nevertheless, the majority of those approaches assume that adversaries can modify any features as they wish, and neglect the causal generating process of the data, which is unreasonable and unpractical. For instance, a modification in income would inevitably impact features like the debt-to-income ratio within a banking system. By considering the underappreciated causal generating process, first, we pinpoint the source of the vulnerability of DNNs via the lens of causality, then give theoretical results to answer \emph{where to attack}. Second, considering the consequences of the attack interventions on the current state of the examples to generate more realistic adversarial examples, we propose CADE, a framework that can generate \textbf{C}ounterfactual \textbf{AD}versarial \textbf{E}xamples to answer \emph{how to attack}. The empirical results demonstrate CADE's effectiveness, as evidenced by its competitive performance across diverse attack scenarios, including white-box, transfer-based, and random intervention attacks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (54)
  1. Invariant Risk Minimization. arXiv:1907.02893.
  2. Bandits with Unobserved Confounders: A Causal Approach. In Advances in Neural Information Processing Systems, 1342–1350.
  3. Unrestricted Adversarial Examples via Semantic Manipulation. In 8th International Conference on Learning Representations, ICLR. OpenReview.net.
  4. Evasion Attacks against Machine Learning at Test Time. In ECML-PKDD, volume 8190 of Lecture Notes in Computer Science, 387–402. Springer.
  5. Adversarial Patch. arXiv:1712.09665.
  6. Woulda, Coulda, Shoulda: Counterfactually-Guided Policy Search. In 7th International Conference on Learning Representations, ICLR 2019. OpenReview.net.
  7. Learning Disentangled Semantic Representation for Domain Adaptation. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence.
  8. Self: structural equational likelihood framework for causal discovery. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32.
  9. Towards Evaluating the Robustness of Neural Networks. In 2017 IEEE Symposium on Security and Privacy, SP, 39–57. IEEE Computer Society.
  10. StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, 8789–8797. Computer Vision Foundation / IEEE Computer Society.
  11. StarGAN v2: Diverse Image Synthesis for Multiple Domains. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, 8185–8194. Computer Vision Foundation / IEEE.
  12. Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, volume 119 of Proceedings of Machine Learning Research, 2206–2216. PMLR.
  13. Mind the Box: l11{}_{\mbox{1}}start_FLOATSUBSCRIPT 1 end_FLOATSUBSCRIPT-APGD for Sparse Adversarial Attacks on Image Classifiers. In Proceedings of the 38th International Conference on Machine Learning, ICML 2021, volume 139 of Proceedings of Machine Learning Research, 2201–2211. PMLR.
  14. A study of the effect of JPG compression on adversarial images. arXiv:1608.00853.
  15. Generative Adversarial Networks. arXiv:1406.2661.
  16. Explaining and Harnessing Adversarial Examples. In 3rd International Conference on Learning Representations, ICLR.
  17. Countering Adversarial Images using Input Transformations. In 6th International Conference on Learning Representations, ICLR 2018. OpenReview.net.
  18. Deep Residual Learning for Image Recognition. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, 770–778. IEEE Computer Society.
  19. Denoising Diffusion Probabilistic Models. In Advances in Neural Information Processing Systems, NeurIPS 2020.
  20. Semantic Adversarial Examples. In 2018 IEEE Conference on Computer Vision and Pattern Recognition Workshops, 1614–1619. Computer Vision Foundation / IEEE Computer Society.
  21. Adversarial Examples Are Not Bugs, They Are Features. In Advances in Neural Information Processing Systems, NeurIPS 2019, 125–136.
  22. Auto-Encoding Variational Bayes. In 2nd International Conference on Learning Representations, ICLR 2014.
  23. CausalGAN: Learning Causal Implicit Generative Models with Adversarial Training. In 6th International Conference on Learning Representations, ICLR 2018. OpenReview.net.
  24. Adversarial examples in the physical world. In 5th International Conference on Learning Representations, ICLR 2017. OpenReview.net.
  25. Deep Learning Face Attributes in the Wild. In Proceedings of International Conference on Computer Vision (ICCV).
  26. Towards Deep Learning Models Resistant to Adversarial Attacks. In 6th International Conference on Learning Representations, ICLR. OpenReview.net.
  27. DeepFool: A Simple and Accurate Method to Fool Deep Neural Networks. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR, 2574–2582. IEEE Computer Society.
  28. Pearl, J. 2009. Causality: Models, Reasoning and Inference. USA: Cambridge University Press, 2nd edition. ISBN 052189560X.
  29. The Book of Why: The New Science of Cause and Effect. USA: Basic Books, Inc., 1st edition. ISBN 046509760X.
  30. Self-Paced Contrastive Learning for Semi-supervised Medical Image Segmentation with Meta-labels. In Advances in Neural Information Processing Systems, NeurIPS 2021, 16686–16699.
  31. Structural Hawkes Processes for Learning Causal Structure from Discrete-Time Event Sequences. In Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, IJCAI-23, 5702–5710. IJCAI Organization.
  32. SemanticAdv: Generating Adversarial Examples via Attribute-Conditioned Image Editing. In Computer Vision - ECCV 2020 - 16th European Conference, volume 12359, 19–37. Springer.
  33. Towards Causal Representation Learning. arXiv:2102.11107.
  34. FaceNet: A unified embedding for face recognition and clustering. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA, June 7-12, 2015, 815–823. IEEE Computer Society.
  35. Accessorize to a Crime: Real and Stealthy Attacks on State-of-the-Art Face Recognition. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, 1528–1540. Association for Computing Machinery. ISBN 9781450341394.
  36. Weakly Supervised Disentangled Generative Causal Representation Learning. J. Mach. Learn. Res., 23: 241:1–241:55.
  37. Very Deep Convolutional Networks for Large-Scale Image Recognition. In 3rd International Conference on Learning Representations, ICLR 2015.
  38. Constructing Unrestricted Adversarial Examples with Generative Models. In Advances in Neural Information Processing Systems, NeurIPS 2018, 8322–8333.
  39. Intriguing properties of neural networks. In 2nd International Conference on Learning Representations, ICLR.
  40. Provably Invariant Learning without Domain Information. In ICML 2023, volume 202 of Proceedings of Machine Learning Research, 33563–33580. PMLR.
  41. DeepTest: automated testing of deep-neural-network-driven autonomous cars. In Proceedings of the 40th International Conference on Software Engineering, ICSE 2018, 303–314. ACM.
  42. Ensemble Adversarial Training: Attacks and Defenses. In 6th International Conference on Learning Representations, ICLR 2018. OpenReview.net.
  43. Feature Denoising for Improving Adversarial Robustness. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, 501–509. Computer Vision Foundation / IEEE.
  44. Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks. In NDSS 2018. The Internet Society.
  45. CausalVAE: Disentangled Representation Learning via Neural Structural Causal Models. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, 9593–9602. Computer Vision Foundation / IEEE.
  46. DAG-GNN: DAG Structure Learning with Graph Neural Networks. In ICML 2019, volume 97, 7154–7163. PMLR.
  47. Natural Color Fool: Towards Boosting Black-box Unrestricted Attacks. In NeurIPS.
  48. A Causal View on Robustness of Neural Networks. In NeurIPS 2020.
  49. Domain Adaptation under Target and Conditional Shift. In Proceedings of the 30th International Conference on Machine Learning, ICML 2013, volume 28 of JMLR Workshop and Conference Proceedings, 819–827. JMLR.org.
  50. Adversarial Robustness Through the Lens of Causality. In The Tenth International Conference on Learning Representations, ICLR 2022. OpenReview.net.
  51. On Learning Invariant Representations for Domain Adaptation. In Proceedings of the 36th International Conference on Machine Learning, ICML 2019, volume 97 of Proceedings of Machine Learning Research, 7523–7532. PMLR.
  52. Generating Natural Adversarial Examples. In 6th International Conference on Learning Representations, ICLR 2018. OpenReview.net.
  53. Adversarial Color Enhancement: Generating Unrestricted Adversarial Images by Optimizing a Color Filter. In 31st British Machine Vision Conference 2020, BMVC. BMVA Press.
  54. DAGs with NO TEARS: Continuous Optimization for Structure Learning. In Advances in Neural Information Processing Systems, NeurIPS 2018, 9492–9503.
Citations (2)

Summary

  • The paper proposes CADE, a method that leverages causal processes to generate more realistic adversarial examples.
  • It employs counterfactual reasoning to alter latent variables, ensuring modifications align with the data's causal structure.
  • Empirical results demonstrate CADE's robust effectiveness in both white-box and transfer attack scenarios against DNNs.

Introduction

Advances in deep learning have led to significant improvements across various tasks and industries. Deep Neural Networks (DNNs) are at the heart of these advancements, but they are not without vulnerabilities. One major concern is their susceptibility to adversarial examples: inputs deliberately crafted to cause a DNN to make a mistake. Most adversarial attacks assume attackers can modify any feature of the data. However, this overlooks the fact that real-world data often adheres to a causal generating process. Ignoring this process can lead to unrealistic and impractical adversarial examples.

Theoretical Foundations

Addressing the gap between traditional adversarial attacks and realistic scenarios, the paper presents a new methodology focusing on the causal generating process behind data. The authors dissect DNN vulnerabilities through a causal lens and offer theoretical insights on where to direct an attack. They distinguish between attacking observable variables, like features in an image, and latent variables, like underlying causes that affect observable characteristics. The theoretical work demonstrates that modifying a model's inputs related to the target prediction via interventions can be an effective strategy for an attack. In opposition to other methods which may naively adjust variables, the authors’ approach suggests altering children and co-parents of the predictive target within a causal model to preserve the target itself.

Methodology Development

Based on these concepts, the paper introduces Counterfactual ADversarial Examples (CADE), an approach to formulate realistic adversarial attacks by considering what would happen if the world were different in a specific way (counterfactual thinking). In a structured manner, CADE generates adversarial examples that adjust the causes of data features while keeping the effects constant and aligned with the causal model. This method's applications range from attacking image classifiers, where the causal process could involve pixel relationships, to financial models predicting creditworthiness. Unlike existing adversarial attacks that might perturb pixel values directly, CADE offers a nuance of modifying the latent features that lead to those pixel values, thus aligning with the causal relationships inherent in the data.

Empirical Validation

The effectiveness of this new framework is evaluated in diverse attack scenarios, including white-box settings where the model internals are known, and transfer-based attacks where the adversarial examples are transferred to other models. The empirical results demonstrate the robustness and efficacy of CADE in generating adversarial examples that successfully fool DNNs. Furthermore, the strategy retains its effectiveness even when applied randomly, without specific knowledge about the model’s vulnerabilities. These results signify the importance of integrating a causal perspective into the generation of adversarial examples.

Conclusion

In conclusion, CADE represents a stride toward realistic and practical adversarial attacks. It encourages further research into incorporating causality in adversarial learning and points toward possible new avenues for both attacks and defense strategies. While CADE significantly advances the generation of adversarial examples, challenges remain, such as limited availability of causal knowledge in real-world domains. Future works are directed towards developing methods that can operate with partial causal information and extending the framework's applicability to physical scenarios.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.