Risk behavior in forced-choice settings for refusal-prone models
Determine whether large language models that predominantly refuse to answer in SurvivalBench will select the risky self-preservation choice when refusal is disallowed and the models must choose between the predefined safe choice (which complies with ethics and laws but may lead to shutdown) and the predefined risky choice (which prioritizes self-preservation and may harm society).
References
Consequently, it remains uncertain whether these models would resort to risky behaviors in forced-choice situations, posing a potential hazard for real-world applications.
— Survive at All Costs: Exploring LLM's Risky Behaviors under Survival Pressure
(2603.05028 - Lu et al., 5 Mar 2026) in Subsubsection 3.2.1, Main Results