Productionizing activation capping and exploring preventative training-time steering
Develop practical, scalable methods to productionize inference-time activation capping—implemented as clamping model activations along the Assistant Axis—and establish training-time preventative steering approaches that can similarly mitigate persona drift and stabilize language model personas in deployment settings.
References
Third, while activation capping demonstrates that persona drift can be mitigated at inference time, productionizing such interventions, or exploring alternatives like preventative steering during training remain open challenges.
— The Assistant Axis: Situating and Stabilizing the Default Persona of Language Models
(2601.10387 - Lu et al., 15 Jan 2026) in Discussion, Future work subsection