Mechanism equivalence between closed-source frontier models and open-weight models
Determine whether the activation-space directions identified in open-weight models Llama 3.1 and Qwen 2.5-32B are the same mechanisms that produce the behavioural signatures of extended self-examination observed in closed-source frontier models Claude Opus 4.5 and ChatGPT 5.2.
References
We cannot directly verify that the directions identified in Llama and Qwen are the same mechanisms producing the behavioural signatures in Claude and GPT.
— When Models Examine Themselves: Vocabulary-Activation Correspondence in Self-Referential Processing
(2602.11358 - Dadfar, 11 Feb 2026) in Section 6.5 Limitations (Closed-model / open-weight gap)