Effect of quantisation on vocabulary–activation correspondence

Verify whether the vocabulary–activation correspondences and steering effects reported for Llama 3.1-70B under 4-bit NF4 quantisation persist at full precision by replicating analyses without quantisation.

Background

All 70B experiments used 4-bit NF4 quantisation with double quantisation, which may alter activation dynamics compared to full-precision models.

The authors explicitly state that they have not verified whether the same correspondences hold at full precision, identifying an uncertainty about the robustness of reported effects to quantisation level.

References

Quantisation compresses weight representations and may affect activation dynamics. The effects we report are measured within the quantised model and are internally consistent, but we have not verified that the same correspondences hold at full precision.

— When Models Examine Themselves: Vocabulary-Activation Correspondence in Self-Referential Processing (2602.11358 - Dadfar, 11 Feb 2026) in Section 6.5 Limitations (Quantisation)

Effect of quantisation on vocabulary–activation correspondence

Background

References

Related Problems