The Dual-Route Model of Induction

Published 3 Apr 2025 in cs.CL and cs.AI | (2504.03022v2)

Abstract: Prior work on in-context copying has shown the existence of induction heads, which attend to and promote individual tokens during copying. In this work we discover a new type of induction head: concept-level induction heads, which copy entire lexical units instead of individual tokens. Concept induction heads learn to attend to the ends of multi-token words throughout training, working in parallel with token-level induction heads to copy meaningful text. We show that these heads are responsible for semantic tasks like word-level translation, whereas token induction heads are vital for tasks that can only be done verbatim (like copying nonsense tokens). These two "routes" operate independently: we show that ablation of token induction heads causes models to paraphrase where they would otherwise copy verbatim. By patching concept induction head outputs, we find that they contain language-independent word representations that mediate natural language translation, suggesting that LLMs represent abstract word meanings independent of language or form.

Abstract PDF Upgrade to Chat

Summary

The paper introduces dual induction heads, distinguishing between token-level copying and concept-level semantic transfer in LLMs.
Causal patching and ablation experiments reveal that concept heads drive semantic tasks like translation and synonym generation while token heads support verbatim copying.
Findings underscore a mechanistic dual-route framework that informs interpretability, enables generative control, and guides multilingual model design.

The Dual-Route Model of Induction in LLMs

Background and Motivation

The study introduces a new mechanistic account of copying and semantic transfer in LLMs—the Dual-Route Model of Induction. While prior work has established the existence of “induction heads”, specialized attention heads responsible for in-context token copying [elhage2021mathematical], this paper identifies and characterizes a novel class: concept-level induction heads, which operate on lexical units spanning multiple tokens. The work is deeply inspired by the dual-route model of human reading, which distinguishes between sublexical (grapheme-level, phonological) and lexical (whole-word, semantic) processing routes [marshallnewcombe1966]. Applying this analogy, the authors explore how LLMs encode and manipulate both literal token sequences and abstract concepts, with substantial implications for interpretability and control of generative behavior.

Methodology: Identification and Causal Analysis of Induction Heads

The paper operationalizes “concept induction heads” through causal intervention techniques—patching individual attention head activations across contexts and measuring their impact on predicted token probabilities. The authors define two distinct scores for each head:

Concept Copying Score: Quantifies the increase in future token probability when patching the head in prompts containing multi-token concepts sampled from the CounterFact dataset [meng2022locating].
Token Copying Score: Evaluates the same intervention for random tokens, tracking verbatim token-level copying.

Heads are ranked and analyzed in several models (Llama-2-7b, Llama-3-8b, OLMo-2-7b, Pythia-6.9b), consistently finding strong separation between concept and token induction heads: concept copiers cluster in earlier/mid layers, token copiers sporadically occupy later layers, and correlation between the two scores is negligible or negative.

Attention Dynamics: Next-Token vs Last-Token Matching

The study extends attention analysis using value-weighted matching scores [kobayashi2020attention]. Token induction heads consistently attend to the next token of repeated spans, while concept induction heads attend to the last token of a multi-token concept. The correlation between concept copying scores and last-token matching scores is robust across models (e.g., $r=0.44$ for Llama-2-7b, $p<0.001$ ), reinforcing the functional distinction in heads’ roles.

Additionally, mixed heads are observed that display both types of matching, acting as general “copiers” of the next salient lexical unit regardless of its tokenized length.

Ablation Experiments: Semantic vs Verbatim Copying

Targeted mean ablations of top token and concept induction heads are performed on “vocabulary list” tasks incorporating translation, synonyms, antonyms, capitalization, and verbatim copying. The results are stark:

Ablation of Token Induction Heads: Destroys verbatim copying for nonsense tokens, induces paraphrasing rather than literal copying, but leaves semantic tasks largely intact.
Ablation of Concept Induction Heads: Dramatically reduces performance in translation and synonym/antonym tasks, but leaves surface-level copying unscathed.

In English copying and uppercasing (ambiguous tasks), performance remains high unless both types are ablated, indicating redundancy. Notably, similar dissociation persists even when controlling for word length (single-token concepts), though effects are muted.

Qualitative outputs show paraphrasing replaces verbatim copying when token induction heads are ablated, including rewriting Python loops via list comprehensions, confirming the hypothesized semantic route’s operation.

Concept Heads and Semantic Representation

The paper introduces a “concept lens”—the sum of OV matrices from top concept induction heads—applied to hidden states. Projection to vocabulary space demonstrates that context-sensitive semantic representations emerge (e.g., the token “inals” resolves to different meanings—Cardinals as a team, a bird, or clergy—depending on context), beyond surface token patterns.

Language-Agnostic Semantic Transfer

A crucial experiment involves patching concept head activations from translation prompts in one language pair (e.g., Spanish-Italian) to another (Japanese-Chinese), using multi-token generation techniques [fiottokaufman2025nnsight]. The patched model outputs the source concept (e.g., “child”) in the base output language (Chinese), with accuracy (≈0.40) comparable to original translation performance (≈0.48). This demonstrates that concept induction heads encode abstract, language-independent meaning—a strong claim supported by observed activation invariance across languages. The effect strengthens with increasing number of patched heads ( $k=80$ ), matching ablation results and maximizing semantic separation.

Function Vector Heads: Complementary Mechanisms

The authors contrast concept induction heads with function vector (FV) heads [todd2024function], which encode task logic (e.g., output language for translation). Although weak correlations are observed between concept copying and FV scores, patching FV heads switches output language without leaking semantics, reinforcing functional orthogonality. Ablation of FV heads impairs non-verbatim tasks, underscoring their necessity for task execution.

Theoretical and Practical Implications

The dual-route model provides a granular mechanistic lens for understanding the emergence and separation of literal and semantic copying in transformer LLMs. Concept induction heads enable LLMs to perform semantic tasks such as translation and paraphrasing, while token induction heads facilitate verbatim copying and pattern matching. The independence and complementarity of these circuits suggests that in-context learning arises from parallel, specialized attention mechanisms, not a uniform process.

This framework offers actionable avenues for interpretability—selective ablation can steer generative behavior (e.g., paraphrase-vs-copy control), and activation patching enables language-agnostic conceptual transfer. Furthermore, the model predicts that scaling and training dynamics may shape the development and separation of token and concept heads, with implications for multilingual, low-resource, and domain-adaptive LLMs.

Conclusion

By combining causal patching, attention analysis, and ablation, the Dual-Route Model of Induction (2504.03022) establishes that LLMs possess distinct token and concept induction heads, functioning in parallel to enable verbatim and semantic copying. Concept induction heads encode language-agnostic word representations and mediate tasks requiring abstract meaning transfer, exhibiting strong empirical separation from literal token copiers. The findings enrich our mechanistic understanding of LLMs’ internal structures and their ability to learn and manipulate lexical information, offering robust frameworks for future interpretability and controllability research.