Journey to the Center of the Knowledge Neurons: Discoveries of Language-Independent Knowledge Neurons and Degenerate Knowledge Neurons
Abstract: Pre-trained LLMs (PLMs) contain vast amounts of factual knowledge, but how the knowledge is stored in the parameters remains unclear. This paper delves into the complex task of understanding how factual knowledge is stored in multilingual PLMs, and introduces the Architecture-adapted Multilingual Integrated Gradients method, which successfully localizes knowledge neurons more precisely compared to current methods, and is more universal across various architectures and languages. Moreover, we conduct an in-depth exploration of knowledge neurons, leading to the following two important discoveries: (1) The discovery of Language-Independent Knowledge Neurons, which store factual knowledge in a form that transcends language. We design cross-lingual knowledge editing experiments, demonstrating that the PLMs can accomplish this task based on language-independent neurons; (2) The discovery of Degenerate Knowledge Neurons, a novel type of neuron showing that different knowledge neurons can store the same fact. Its property of functional overlap endows the PLMs with a robust mastery of factual knowledge. We design fact-checking experiments, proving that the degenerate knowledge neurons can help the PLMs to detect wrong facts. Experiments corroborate these findings, shedding light on the mechanisms of factual knowledge storage in multilingual PLMs, and contribute valuable insights to the field. The code is available at https://github.com/heng840/AMIG.
- Ancona, M.; et al. 2019. Gradient-based attribution methods. In Explainable AI: Interpreting, explaining and visualizing deep learning, 169–191.
- Andreas, J. 2022. Language Models as Agent Models. arXiv:2212.01681.
- The Life Cycle of Knowledge in Big Language Models: A Survey. arXiv:2303.07616.
- Knowledge Neurons in Pretrained Transformers. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2022, Dublin, Ireland, May 22-27, 2022, 8493–8502.
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. CoRR, abs/1810.04805.
- Edwards, B. 2022. OpenAI invites everyone to test ChatGPT, a new AI-powered chatbot—with amusing results. Retrieved 29 December 2022.
- Edwards, B. 2023. Why ChatGPT and Bing Chat are so good at making things up. Retrieved 11 June 2023.
- Enguehard, J. 2023. Sequential Integrated Gradients: a simple but effective method for explaining language models. arXiv:2305.15853.
- Transformer Feed-Forward Layers Are Key-Value Memories. arXiv:2012.14913.
- Does Localization Inform Editing? Surprising Differences in Causality-Based Localization vs. Knowledge Editing in Language Models. arXiv:2301.04213.
- How Can We Know What Language Models Know? arXiv:1911.12543.
- Large language models struggle to learn long-tail knowledge. In International Conference on Machine Learning, 15696–15707. PMLR.
- Multilingual LAMA: Investigating Knowledge in Multilingual Pretrained Language Models. arXiv:2102.00894.
- Lakshmanan, L. 2022. Why large language models like ChatGPT are bullshit artists. becominghuman.ai. Archived from the original on December 17, 2022.
- How Pre-trained Language Models Capture Factual Knowledge? A Causal-Inspired Analysis. arXiv:2203.16747.
- The Effective coalitions of Shapley value For Integrated Gradients.
- A rigorous study of integrated gradients method and extensions to internal neuron attributions. In International Conference on Machine Learning, 14485–14508. PMLR.
- Mason, P. H. 2015. Degeneracy: Demystifying and destigmatizing a core concept in systems biology. Complexity, 20(3): 12–21.
- Locating and Editing Factual Associations in GPT. Advances in Neural Information Processing Systems, 36.
- Mass Editing Memory in a Transformer. arXiv preprint arXiv:2210.07229.
- Metz, C. 2022. The new chatbots could change the world. Can you trust them. The New York Times, 10.
- Fast Model Editing at Scale. In International Conference on Learning Representations.
- OpenAI. 2023. GPT-4 Technical Report. arXiv:2303.08774.
- How Context Affects Language Models’ Factual Predictions. In Automated Knowledge Base Construction.
- Language Models as Knowledge Bases? In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2463–2473. Hong Kong, China: Association for Computational Linguistics.
- Language Models as Knowledge Bases? arXiv:1909.01066.
- Pitt, S. 2022. Google vs. ChatGPT: Here’s what happened when I swapped services for a day. Retrieved 30 December 2022.
- Language Models are Unsupervised Multitask Learners.
- Discretized Integrated Gradients for Explaining Language Models. arXiv:2108.13654.
- mGPT: Few-Shot Learners Go Multilingual.
- Axiomatic Attribution for Deep Networks. arXiv:1703.01365.
- Measures of degeneracy and redundancy in biological networks. Proceedings of the National Academy of Sciences, 96(6): 3257–3262.
- Llama 2: Open Foundation and Fine-Tuned Chat Models. arXiv:2307.09288.
- Scientific Fact-Checking: A Survey of Resources and Approaches. arXiv:2305.16859.
- On Negative Interference in Multilingual Models: Findings and A Meta-Learning Treatment. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 4438–4450. Online: Association for Computational Linguistics.
- Language Anisotropic Cross-Lingual Model Editing. arXiv:2205.12677.
- A survey of large language models. arXiv preprint arXiv:2303.18223.
- A survey on knowledge-enhanced pre-trained language models. arXiv preprint arXiv:2212.13428.
- A comprehensive survey on pretrained foundation models: A history from bert to chatgpt. arXiv preprint arXiv:2302.09419.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.