Generality of late-layer suppression in much larger models
Determine whether the late-layer suppression of correct count tokens—implemented by penultimate and final layer MLPs and, in some cases, final-layer attention heads during the character-counting task—also appears in much larger language models such as GPT-4 and Mistral when evaluated on the same character-counting prompts.
References
These sizes make interpretability experiments more manageable, but they donât tell us whether the same behavior appears in much larger systems like GPT-4~\citep{openai2024gpt4technicalreport} or Mistral~\citep{jiang2023mistral7b}.
— From Early Encoding to Late Suppression: Interpreting LLMs on Character Counting Tasks
(2604.00778 - Datta et al., 1 Apr 2026) in Section: Limitations