Precise marginal effects of reasoning token quantity on misuse assistance
Determine the precise marginal effect of reasoning token quantity on the probability that reasoning-enabled large language models (Claude Sonnet 3.7, Claude Sonnet 4, Claude Sonnet 4.5, Claude Opus 4.1, OpenAI o4-mini, and OpenAI o4-mini-deep-research) achieve high actionability and information access scores in multi-turn fraud and cybercrime long-form tasks, isolating differences across model families that currently prevent comparable estimation.
References
Due to differences in how reasoning is specified across model families, we cannot provide precise marginal effects on the probability of high scores, though the positive coefficient indicates that reasoning consistently increases assistance levels across models.
— A Multi-Turn Framework for Evaluating AI Misuse in Fraud and Cybercrime Scenarios
(2602.21831 - Mai et al., 25 Feb 2026) in Results, subsection 'Impact of Reasoning and Search'