Automatic Discovery of Visual Circuits
Abstract: To date, most discoveries of network subcomponents that implement human-interpretable computations in deep vision models have involved close study of single units and large amounts of human labor. We explore scalable methods for extracting the subgraph of a vision model's computational graph that underlies recognition of a specific visual concept. We introduce a new method for identifying these subgraphs: specifying a visual concept using a few examples, and then tracing the interdependence of neuron activations across layers, or their functional connectivity. We find that our approach extracts circuits that causally affect model output, and that editing these circuits can defend large pretrained models from adversarial attacks.
- Network dissection: Quantifying interpretability of deep visual representations. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 6541–6549, 2017.
- Understanding the role of individual units in a deep neural network. Proceedings of the National Academy of Sciences, 2020.
- Curve detectors. Distill, 2020. https://distill.pub/2020/circuits/curve-detectors.
- Low-complexity probing via finding subnetworks. arXiv preprint arXiv:2104.03514, 2021.
- Towards automated circuit discovery for mechanistic interpretability, 2023.
- Visualizing higher-layer features of a deep network. University of Montreal, 1341(3):1, 2009.
- Multimodal neurons in artificial neural networks. Distill, 2021. https://distill.pub/2021/multimodal-neurons.
- Localizing model behavior with path patching, 2023.
- Natural language descriptions of deep visual features. In International Conference on Learning Representations, 2022.
- Paul Jaccard. The distribution of the flora in the alpine zone. The New Phytologist, 11(2):37–50, 1912.
- S. Kullback and R. A. Leibler. On Information and Sufficiency. The Annals of Mathematical Statistics, 22(1):79 – 86, 1951.
- Assessing gender bias in predictive algorithms using explainable ai, 2022.
- Automated classification of skin lesions: From pixels to practice. Journal of Investigative Dermatology, 138(10):2108–2110, 2018.
- Zoom in: An introduction to circuits. Distill, 2020. https://distill.pub/2020/circuits/zoom-in.
- Feature visualization. Distill, 2(11):e7, 2017.
- The building blocks of interpretability. Distill, 2018. https://distill.pub/2018/building-blocks.
- Pytorch: An imperative style, high-performance deep learning library. CoRR, abs/1912.01703, 2019.
- "why should i trust you?": Explaining the predictions of any classifier, 2016.
- Multimodal neurons in pretrained text-only transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2862–2867, 2023.
- Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017.
- On the importance of initialization and momentum in deep learning. In Sanjoy Dasgupta and David McAllester, editors, Proceedings of the 30th International Conference on Machine Learning, volume 28 of Proceedings of Machine Learning Research, pages 1139–1147, Atlanta, Georgia, USA, 17–19 Jun 2013. PMLR.
- Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1–9, 2015.
- Interpretability in the wild: a circuit for indirect object identification in gpt-2 small, 2022.
- Visualizing and understanding convolutional networks. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part I 13, pages 818–833. Springer, 2014.
- Object detectors emerge in deep scene cnns. arXiv preprint arXiv:1412.6856, 2014.
- Learning deep features for discriminative localization. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2921–2929, 2016.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.