Causal verification of the introspection mechanism in Qwen 2.5-32B
Establish whether the mechanism underlying vocabulary–activation correspondence in Qwen 2.5-32B is the same as the introspection direction identified in Llama 3.1 by extracting a Qwen-specific introspection direction at Layer 8, performing causal steering, and characterising dose–response effects.
References
Our Qwen experiments are observational: we establish correspondence but do not extract a Qwen-specific introspection direction or test causal steering. The three Qwen correspondences survive all statistical controls and the descriptive control, but without causal intervention, we cannot confirm that the mechanism identified in Llama is the same one operating in Qwen.
— When Models Examine Themselves: Vocabulary-Activation Correspondence in Self-Referential Processing
(2602.11358 - Dadfar, 11 Feb 2026) in Section 5.6 Cross-Architecture Replication (Causal gap)