Conjecture on where procedural knowledge resides for ReasonCache versus SFT

Ascertain whether ReasonCache stores procedural knowledge directly in the learned key–value cache—thereby eliminating the need to externalize it into generated token sequences—whereas supervised fine-tuning externalizes weight-encoded procedural knowledge during generation, causing unnecessary verbosity.

Background

Empirically, the authors observe that ReasonCache achieves higher accuracy than supervised fine-tuning on GPQA-Diamond while generating substantially shorter reasoning chains. They propose a mechanistic explanation: supervised fine-tuning may externalize procedural knowledge into explicit tokens during inference, while ReasonCache may encode such knowledge directly in its KV-prefix, reducing the need for long chains.

This explanation is stated as a conjecture in the figure caption, indicating an unresolved hypothesis about the internal representation of procedural knowledge under different adaptation strategies.

References

Our conjecture is that in contrast, ReasonCache stores procedural knowledge directly in the KV cache, eliminating the need for explicit externalization.

ReasonCACHE: Teaching LLMs To Reason Without Weight Updates  (2602.02366 - Gupta et al., 2 Feb 2026) in Figure caption for Fig. ‘verbosity’, Section 3.3 (Inference Efficiency)