Papers
Topics
Authors
Recent
Search
2000 character limit reached

Automatic Discovery of Visual Circuits

Published 22 Apr 2024 in cs.CV and cs.AI | (2404.14349v1)

Abstract: To date, most discoveries of network subcomponents that implement human-interpretable computations in deep vision models have involved close study of single units and large amounts of human labor. We explore scalable methods for extracting the subgraph of a vision model's computational graph that underlies recognition of a specific visual concept. We introduce a new method for identifying these subgraphs: specifying a visual concept using a few examples, and then tracing the interdependence of neuron activations across layers, or their functional connectivity. We find that our approach extracts circuits that causally affect model output, and that editing these circuits can defend large pretrained models from adversarial attacks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (26)
  1. Network dissection: Quantifying interpretability of deep visual representations. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 6541–6549, 2017.
  2. Understanding the role of individual units in a deep neural network. Proceedings of the National Academy of Sciences, 2020.
  3. Curve detectors. Distill, 2020. https://distill.pub/2020/circuits/curve-detectors.
  4. Low-complexity probing via finding subnetworks. arXiv preprint arXiv:2104.03514, 2021.
  5. Towards automated circuit discovery for mechanistic interpretability, 2023.
  6. Visualizing higher-layer features of a deep network. University of Montreal, 1341(3):1, 2009.
  7. Multimodal neurons in artificial neural networks. Distill, 2021. https://distill.pub/2021/multimodal-neurons.
  8. Localizing model behavior with path patching, 2023.
  9. Natural language descriptions of deep visual features. In International Conference on Learning Representations, 2022.
  10. Paul Jaccard. The distribution of the flora in the alpine zone. The New Phytologist, 11(2):37–50, 1912.
  11. S. Kullback and R. A. Leibler. On Information and Sufficiency. The Annals of Mathematical Statistics, 22(1):79 – 86, 1951.
  12. Assessing gender bias in predictive algorithms using explainable ai, 2022.
  13. Automated classification of skin lesions: From pixels to practice. Journal of Investigative Dermatology, 138(10):2108–2110, 2018.
  14. Zoom in: An introduction to circuits. Distill, 2020. https://distill.pub/2020/circuits/zoom-in.
  15. Feature visualization. Distill, 2(11):e7, 2017.
  16. The building blocks of interpretability. Distill, 2018. https://distill.pub/2018/building-blocks.
  17. Pytorch: An imperative style, high-performance deep learning library. CoRR, abs/1912.01703, 2019.
  18. "why should i trust you?": Explaining the predictions of any classifier, 2016.
  19. Multimodal neurons in pretrained text-only transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2862–2867, 2023.
  20. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017.
  21. On the importance of initialization and momentum in deep learning. In Sanjoy Dasgupta and David McAllester, editors, Proceedings of the 30th International Conference on Machine Learning, volume 28 of Proceedings of Machine Learning Research, pages 1139–1147, Atlanta, Georgia, USA, 17–19 Jun 2013. PMLR.
  22. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1–9, 2015.
  23. Interpretability in the wild: a circuit for indirect object identification in gpt-2 small, 2022.
  24. Visualizing and understanding convolutional networks. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part I 13, pages 818–833. Springer, 2014.
  25. Object detectors emerge in deep scene cnns. arXiv preprint arXiv:1412.6856, 2014.
  26. Learning deep features for discriminative localization. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2921–2929, 2016.

Summary

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.