Unveiling Language Competence Neurons: A Psycholinguistic Approach to Model Interpretability

Published 24 Sep 2024 in cs.CL | (2409.15827v2)

Abstract: As LLMs advance in their linguistic capacity, understanding how they capture aspects of language competence remains a significant challenge. This study therefore employs psycholinguistic paradigms in English, which are well-suited for probing deeper cognitive aspects of language processing, to explore neuron-level representations in LLM across three tasks: sound-shape association, sound-gender association, and implicit causality. Our findings indicate that while GPT-2-XL struggles with the sound-shape task, it demonstrates human-like abilities in both sound-gender association and implicit causality. Targeted neuron ablation and activation manipulation reveal a crucial relationship: When GPT-2-XL displays a linguistic ability, specific neurons correspond to that competence; conversely, the absence of such an ability indicates a lack of specialized neurons. This study is the first to utilize psycholinguistic experiments to investigate deep language competence at the neuron level, providing a new level of granularity in model interpretability and insights into the internal mechanisms driving language ability in the transformer-based LLM.

Abstract PDF Upgrade to Chat

Authors (4)

Summary

The paper demonstrates that targeted neuron ablation exposes specific neurons responsible for encoding distinct language competences in GPT-2-XL.
It finds that the model exhibits human-like performance in sound-gender association and implicit causality tasks, highlighting robust neuron-level coding.
The study leverages psycholinguistic paradigms to reveal discrepancies in neuron activation, particularly in the challenging sound-shape association task.

The paper "Unveiling Language Competence Neurons: A Psycholinguistic Approach to Model Interpretability" aims to elucidate how LLMs, such as GPT-2-XL, internalize and demonstrate language competences at the neuron level, leveraging psycholinguistic paradigms. This interpretability study is significant due to the burgeoning capabilities of LLMs in linguistic tasks, necessitating a deeper understanding of their internal workings.

To probe these mechanisms, the authors designed three distinct tasks:

Sound-Shape Association - This task examines the model's ability to link phonetic properties of words to visual shapes, a cognitive ability observed in humans where certain sounds are intuitively associated with specific shapes (e.g., "bouba" with rounded shapes and "kiki" with spiky shapes).
Sound-Gender Association - This investigates the model's capacity to associate sounds with gendered perceptions, reflecting sociolinguistic patterns where certain phonetic elements might be perceived as more feminine or masculine.
Implicit Causality - This task assesses the model's understanding of causal relationships embedded in linguistic structures, crucial for grasping coherence and inferential aspects of language processing.

Key findings from these experiments demonstrated that GPT-2-XL exhibited performance discrepancies across different tasks. Specifically:

Sound-Shape Association: GPT-2-XL struggled with this task, indicating a lack of clear neuron-level representation for this form of cognitive association.
Sound-Gender Association and Implicit Causality: The model showcased human-like abilities, suggesting more robust neuron-level coding for these linguistic competences.

A critical methodological component of the study involved targeted neuron ablation and activation manipulation to uncover the neurons' roles in specific linguistic capabilities. The results reveal:

Neuron Specificity: There is a discernible relationship between the model's linguistic competence and specific neuron activation. When GPT-2-XL demonstrated a particular linguistic ability, it corresponded to the activation of specific neurons.
Absence of Ability: Conversely, the absence of a linguistic competence in the model correlated with the lack of specialized neuron activation.

This approach provided unique insights into model interpretability by identifying "language competence neurons." It marked the first use of psycholinguistic experiments to probe neuron-level representations in transformer-based LLMs, introducing a nuanced framework for understanding internal mechanisms driving language abilities. The study's findings propel forward the interpretability of LLMs, enhancing our comprehension of how these models process and generate human-like language competences on a neuronal level.

Markdown Report Issue