- The paper introduces dual induction heads, distinguishing between token-level copying and concept-level semantic transfer in LLMs.
- Causal patching and ablation experiments reveal that concept heads drive semantic tasks like translation and synonym generation while token heads support verbatim copying.
- Findings underscore a mechanistic dual-route framework that informs interpretability, enables generative control, and guides multilingual model design.
The Dual-Route Model of Induction in LLMs
Background and Motivation
The study introduces a new mechanistic account of copying and semantic transfer in LLMs—the Dual-Route Model of Induction. While prior work has established the existence of “induction heads”, specialized attention heads responsible for in-context token copying [elhage2021mathematical], this paper identifies and characterizes a novel class: concept-level induction heads, which operate on lexical units spanning multiple tokens. The work is deeply inspired by the dual-route model of human reading, which distinguishes between sublexical (grapheme-level, phonological) and lexical (whole-word, semantic) processing routes [marshallnewcombe1966]. Applying this analogy, the authors explore how LLMs encode and manipulate both literal token sequences and abstract concepts, with substantial implications for interpretability and control of generative behavior.
Methodology: Identification and Causal Analysis of Induction Heads
The paper operationalizes “concept induction heads” through causal intervention techniques—patching individual attention head activations across contexts and measuring their impact on predicted token probabilities. The authors define two distinct scores for each head:
- Concept Copying Score: Quantifies the increase in future token probability when patching the head in prompts containing multi-token concepts sampled from the CounterFact dataset [meng2022locating].
- Token Copying Score: Evaluates the same intervention for random tokens, tracking verbatim token-level copying.
Heads are ranked and analyzed in several models (Llama-2-7b, Llama-3-8b, OLMo-2-7b, Pythia-6.9b), consistently finding strong separation between concept and token induction heads: concept copiers cluster in earlier/mid layers, token copiers sporadically occupy later layers, and correlation between the two scores is negligible or negative.
Attention Dynamics: Next-Token vs Last-Token Matching
The study extends attention analysis using value-weighted matching scores [kobayashi2020attention]. Token induction heads consistently attend to the next token of repeated spans, while concept induction heads attend to the last token of a multi-token concept. The correlation between concept copying scores and last-token matching scores is robust across models (e.g., r=0.44 for Llama-2-7b, p<0.001), reinforcing the functional distinction in heads’ roles.
Additionally, mixed heads are observed that display both types of matching, acting as general “copiers” of the next salient lexical unit regardless of its tokenized length.
Ablation Experiments: Semantic vs Verbatim Copying
Targeted mean ablations of top token and concept induction heads are performed on “vocabulary list” tasks incorporating translation, synonyms, antonyms, capitalization, and verbatim copying. The results are stark:
- Ablation of Token Induction Heads: Destroys verbatim copying for nonsense tokens, induces paraphrasing rather than literal copying, but leaves semantic tasks largely intact.
- Ablation of Concept Induction Heads: Dramatically reduces performance in translation and synonym/antonym tasks, but leaves surface-level copying unscathed.
In English copying and uppercasing (ambiguous tasks), performance remains high unless both types are ablated, indicating redundancy. Notably, similar dissociation persists even when controlling for word length (single-token concepts), though effects are muted.
Qualitative outputs show paraphrasing replaces verbatim copying when token induction heads are ablated, including rewriting Python loops via list comprehensions, confirming the hypothesized semantic route’s operation.
Concept Heads and Semantic Representation
The paper introduces a “concept lens”—the sum of OV matrices from top concept induction heads—applied to hidden states. Projection to vocabulary space demonstrates that context-sensitive semantic representations emerge (e.g., the token “inals” resolves to different meanings—Cardinals as a team, a bird, or clergy—depending on context), beyond surface token patterns.
Language-Agnostic Semantic Transfer
A crucial experiment involves patching concept head activations from translation prompts in one language pair (e.g., Spanish-Italian) to another (Japanese-Chinese), using multi-token generation techniques [fiottokaufman2025nnsight]. The patched model outputs the source concept (e.g., “child”) in the base output language (Chinese), with accuracy (≈0.40) comparable to original translation performance (≈0.48). This demonstrates that concept induction heads encode abstract, language-independent meaning—a strong claim supported by observed activation invariance across languages. The effect strengthens with increasing number of patched heads (k=80), matching ablation results and maximizing semantic separation.
Function Vector Heads: Complementary Mechanisms
The authors contrast concept induction heads with function vector (FV) heads [todd2024function], which encode task logic (e.g., output language for translation). Although weak correlations are observed between concept copying and FV scores, patching FV heads switches output language without leaking semantics, reinforcing functional orthogonality. Ablation of FV heads impairs non-verbatim tasks, underscoring their necessity for task execution.
Theoretical and Practical Implications
The dual-route model provides a granular mechanistic lens for understanding the emergence and separation of literal and semantic copying in transformer LLMs. Concept induction heads enable LLMs to perform semantic tasks such as translation and paraphrasing, while token induction heads facilitate verbatim copying and pattern matching. The independence and complementarity of these circuits suggests that in-context learning arises from parallel, specialized attention mechanisms, not a uniform process.
This framework offers actionable avenues for interpretability—selective ablation can steer generative behavior (e.g., paraphrase-vs-copy control), and activation patching enables language-agnostic conceptual transfer. Furthermore, the model predicts that scaling and training dynamics may shape the development and separation of token and concept heads, with implications for multilingual, low-resource, and domain-adaptive LLMs.
Conclusion
By combining causal patching, attention analysis, and ablation, the Dual-Route Model of Induction (2504.03022) establishes that LLMs possess distinct token and concept induction heads, functioning in parallel to enable verbatim and semantic copying. Concept induction heads encode language-agnostic word representations and mediate tasks requiring abstract meaning transfer, exhibiting strong empirical separation from literal token copiers. The findings enrich our mechanistic understanding of LLMs’ internal structures and their ability to learn and manipulate lexical information, offering robust frameworks for future interpretability and controllability research.