Operationalizing “intent” to evade CoT monitors
Formalize and operationalize a measurable definition of a language model’s intent to evade chain-of-thought monitors that enables consistent empirical evaluation across models and setups.
References
Finally, it is not fully clear how to operationalize what it means for a model to have an `intent' to evade CoT monitors.
— Reasoning Models Struggle to Control their Chains of Thought
(2603.05706 - Yueh-Han et al., 5 Mar 2026) in Section 1 (Introduction)