Does cached retrieval for verbal confidence constitute genuine introspection?

Determine whether the cached retrieval mechanism identified in Gemma 3 27B and Qwen 2.5 7B—where confidence is computed during answer generation, cached at the post-answer-newline token, and later retrieved by the confidence-colon token for verbalization—qualifies as introspection in a stronger sense under rigorous criteria for introspective awareness, rather than merely reflecting retrieval of precomputed internal signals.

Background

The paper presents evidence that LLMs such as Gemma 3 27B and Qwen 2.5 7B compute verbal confidence via a cached retrieval mechanism: confidence information is gathered from answer tokens, consolidated at the post-answer-newline (PANL) position, and later retrieved at the confidence-colon (CC) position for output. Causal interventions (steering, patching, noising, swaps) and attention blocking collectively rule out just-in-time computation at CC and map the information flow from answer tokens to PANL to CC.

Beyond timing and localization, the authors show that representations at PANL and CC explain variance in verbal confidence beyond token log-probabilities, suggesting a richer, second-order evaluation of question–answer fit. While these findings indicate a form of metacognitive capability and align with reports of introspective awareness in LLMs, the authors explicitly note that it remains unresolved whether the identified retrieval process satisfies a stronger notion of introspection.

References

This is consistent with recent evidence suggesting that LLMs possess some degree of introspective awareness \citep{anthropic2025introspection}, though whether the retrieval process we characterize constitutes introspection in a stronger sense remains an open question.

— How do LLMs Compute Verbal Confidence (2603.17839 - Kumaran et al., 18 Mar 2026) in Conclusion

Does cached retrieval for verbal confidence constitute genuine introspection?

Background

References

Related Problems