Test-time plastic KV-cache for continual learning

Develop methods for transformers employing prefix tuning—such as ReasonCache—that keep the key–value (KV) cache learnable at test time to enable true continual learning without modifying the pretrained backbone parameters.

Background

The paper shows that ReasonCache (an instantiation of prefix tuning) can teach LLMs to reason without weight updates by learning compact key–value prefixes across attention layers. However, as studied here, the learned prefixes are trained offline and then frozen at deployment, so the approach does not support continual adaptation at test time.

In the Related Work section, the authors explicitly flag that enabling plasticity of the KV-cache at test time—so the model can continue learning during deployment—remains unresolved, identifying it as an open problem tied to moving beyond static, pre-trained prefixes toward true continual learning.

References

Importantly, prefix tuning as studied here is not a test-time learner: the prefix is trained offline and frozen at deployment. Developing methods where the KV-cache remains plastic at test time, enabling true continual learning, remains an open problem.

ReasonCACHE: Teaching LLMs To Reason Without Weight Updates  (2602.02366 - Gupta et al., 2 Feb 2026) in Section 5 (Related Work)