Generality of late-layer suppression in much larger models

Determine whether the late-layer suppression of correct count tokens—implemented by penultimate and final layer MLPs and, in some cases, final-layer attention heads during the character-counting task—also appears in much larger language models such as GPT-4 and Mistral when evaluated on the same character-counting prompts.

Background

The paper shows that across LLaMA, Qwen, and Gemma families (2–3B and 7–9B parameters), character-count information is encoded in early and mid layers but is actively suppressed by late components, especially penultimate and final MLPs and sometimes final-layer attention heads, leading to incorrect outputs.

The authors only analyze models up to 9B parameters. They explicitly note that this analysis does not establish whether the same suppression behavior occurs in much larger proprietary systems like GPT-4 or Mistral, leaving the question open.

References

These sizes make interpretability experiments more manageable, but they donât tell us whether the same behavior appears in much larger systems like GPT-4~\citep{openai2024gpt4technicalreport} or Mistral~\citep{jiang2023mistral7b}.

— From Early Encoding to Late Suppression: Interpreting LLMs on Character Counting Tasks (2604.00778 - Datta et al., 1 Apr 2026) in Section: Limitations

Generality of late-layer suppression in much larger models

Background

References

Related Problems