Papers
Topics
Authors
Recent
Search
2000 character limit reached

Group Crosscoders for Mechanistic Analysis of Symmetry

Published 31 Oct 2024 in cs.LG | (2410.24184v2)

Abstract: We introduce group crosscoders, an extension of crosscoders that systematically discover and analyse symmetrical features in neural networks. While neural networks often develop equivariant representations without explicit architectural constraints, understanding these emergent symmetries has traditionally relied on manual analysis. Group crosscoders automate this process by performing dictionary learning across transformed versions of inputs under a symmetry group. Applied to InceptionV1's mixed3b layer using the dihedral group $\mathrm{D}_{32}$, our method reveals several key insights: First, it naturally clusters features into interpretable families that correspond to previously hypothesised feature types, providing more precise separation than standard sparse autoencoders. Second, our transform block analysis enables the automatic characterisation of feature symmetries, revealing how different geometric features (such as curves versus lines) exhibit distinct patterns of invariance and equivariance. These results demonstrate that group crosscoders can provide systematic insights into how neural networks represent symmetry, offering a promising new tool for mechanistic interpretability.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (23)
  1. Linear algebraic structure of word senses, with applications to polysemy. Transactions of the Association for Computational Linguistics, 6:483–495, 2018.
  2. The statistical inefficiency of sparse coding for images (or, one gabor to rule them all). arXiv preprint arXiv:1109.6638, 2011.
  3. Towards monosemanticity: Decomposing language models with dictionary learning. Transformer Circuits Thread, 2023. https://transformer-circuits.pub/2023/monosemantic-features/index.html.
  4. Curve detectors. Distill, 2020. doi: 10.23915/distill.00024.003. https://distill.pub/2020/circuits/curve-detectors.
  5. Group equivariant convolutional networks. In International conference on machine learning, pp.  2990–2999. PMLR, 2016.
  6. Sparse autoencoders find highly interpretable features in language models, 2023.
  7. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pp.  248–255. Ieee, 2009.
  8. Exploiting cyclic symmetry in convolutional neural networks. In International conference on machine learning, pp.  1889–1898. PMLR, 2016.
  9. Toy models of superposition. Transformer Circuits Thread, 2022.
  10. URL https://x.com/livgorton/status/1818818574443847847.
  11. URL https://livgorton.com/inceptionv1-mixed5b-sparse-autoencoders.
  12. URL https://github.com/liv0617/lucent.
  13. Gorton, L. The missing curve detectors of inceptionv1: Applying sparse autoencoders to inceptionv1 early vision, 2024d. URL https://arxiv.org/abs/2406.03662.
  14. Convolutional networks for images, speech, and time series. The handbook of brain theory and neural networks, 3361(10):1995, 1995.
  15. Understanding image representations by measuring their equivariance and equivalence. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  991–999, 2015.
  16. Sparse crosscoders for cross-layer features and model diffing. 2024.
  17. Feature visualization. Distill, 2017. doi: 10.23915/distill.00007. https://distill.pub/2017/feature-visualization.
  18. An overview of early vision in inceptionv1. Distill, 2020a. doi: 10.23915/distill.00024.002. https://distill.pub/2020/circuits/early-vision.
  19. Naturally occurring equivariance in neural networks. Distill, 2020b. doi: 10.23915/distill.00024.004. https://distill.pub/2020/circuits/equivariance.
  20. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV), 115(3):211–252, 2015. doi: 10.1007/s11263-015-0816-y.
  21. High-low frequency detectors. Distill, 2021. doi: 10.23915/distill.00024.005. https://distill.pub/2020/circuits/frequency-edges.
  22. Swee Kiat, L. Lucent, 2021. URL https://github.com/greentfrapp/lucent.
  23. Going deeper with convolutions. CoRR, abs/1409.4842, 2014. URL http://arxiv.org/abs/1409.4842.

Summary

  • The paper’s main contribution is introducing group crosscoders to automatically extract and cluster symmetrical features in neural networks.
  • The methodology uses dictionary learning on activation vectors from transformed inputs and cosine similarity to reveal distinct feature clusters.
  • Experimental results on InceptionV1’s mixed3b layer clearly separate curvilinear and angular features, enhancing network interpretability.

Analysis of "Group Crosscoders for Mechanistic Analysis of Symmetry"

The paper "Group Crosscoders for Mechanistic Analysis of Symmetry" by Liv Gorton contributes to the understanding of symmetry in neural networks by introducing the concept of group crosscoders. This novel approach extends the traditional crosscoders, which were initially designed to find analogous features across neural network layers, enabling them to systematically explore and analyse the symmetry within neural networks.

Key Contributions

The core contribution of the paper is the introduction of group crosscoders. These are applied to neural networks to identify and cluster symmetrical features, providing a new layer of interpretability into how neural networks inherently develop certain symmetry properties even when these are not explicitly enforced. The approach is demonstrated on the InceptionV1 architecture's mixed3b layer using the dihedral group D32\mathrm{D}_{32}, a structured group with operations including rotations and reflections.

The novelty of group crosscoders lies in their ability to automatically extract and analyze equivariant features, without the need for predefined architectural constraints. By incorporating dictionary learning across transformed input versions under a symmetry group, these crosscoders not only reveal inherent symmetry but also cluster features into interpretable families. This clustering facilitates a more granular understanding of feature types, surpassing the resolution provided by standard sparse autoencoders.

Methodology

Group crosscoders build on the existing framework of crosscoders by learning from activation patterns of transformed inputs under group actions, instead of relying on multilayer or multimodel parallels. The paper conducts rigorous dictionary learning on vectors containing activations from multiple transformations of the same input image, a departure from the conventional vector representations in this context.

The group crosscoder's ability to predict transformed vector activations from untransformed inputs introduces a methodological innovation that aids interpretability. The training process involves constructing datasets from the ImageNet and adopting cosine similarity as a metric to measure feature symmetry through a distance matrix. A UMAP (Uniform Manifold Approximation and Projection) visualization was employed to represent the clustering of feature families effectively.

Experimental Results

The results demonstrate that group crosscoders can discern and classify features into distinct clusters corresponding to previously hypothesized feature families. The experiment with InceptionV1's mixed3b layer shows clear separation and interpretability across these clusters, distinguishing between, for example, curvilinear and angular features in neural network representations. The features’ symmetry analysis reveals how different geometric traits exhibit distinct transformation patterns—such as curves requiring a full 360\textdegree rotation for equivariance compared to lines’ 180\textdegree rotational symmetry.

Implications and Future Directions

The introduction of group crosscoders marks an advancement in mechanistic interpretability approaches. By automating the discovery and clustering of symmetrical features, this method reduces the ambiguity in understanding neural network feature representation, especially concerning innate equivariant properties. This advancement adds depth to ongoing research in vision interpretability by providing a structured approach to symmetry analysis, potentially applicable across modalities beyond image processing or for groups with more complex symmetries.

Future research can explore extending this methodology to groups beyond the dihedral, perhaps incorporating scaling transformations or colour transformations, thereby broadening the applicability and robustness of group crosscoders. Additionally, cross-examination with other architectures could compare the prevalence and nature of symmetrical feature representation across models, potentially influencing design principles in neural network development.

In conclusion, the development of group crosscoders offers an insightful tool for AI researchers looking to explore and quantify neural network symmetries. The methodology sets a precedent for future explorations in mechanistic interpretability, offering potential pathways for rich academic inquiry.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (1)

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 4 tweets with 257 likes about this paper.