On Emotion-Sensitive Decision Making of Small Language Model Agents

Published 8 Apr 2026 in cs.AI | (2604.06562v1)

Abstract: Small LLMs (SLM) are increasingly used as interactive decision-making agents, yet most decision-oriented evaluations ignore emotion as a causal factor influencing behavior. We study emotion-sensitive decision making by combining representation-level emotion induction with a structured game-theoretic evaluation. Emotional states are induced using activation steering derived from crowd-validated, real-world emotion-eliciting texts, enabling controlled and transferable interventions beyond prompt-based methods. We introduce a benchmark built around canonical decision templates that span cooperative and competitive incentives under both complete and incomplete information. These templates are instantiated using strategic scenarios from \textsc{Diplomacy}, \textsc{StarCraft II}, and diverse real-world personas. Experiments across multiple model families in various architecture and modalities, show that emotional perturbations systematically affect strategic choices, but the resulting behaviors are often unstable and not fully aligned with human expectations. Finally, we outline an approach to improve robustness to emotion-driven perturbations.

Abstract PDF Upgrade to Chat

Authors (5)

Summary

The paper demonstrates that activation steering at representation levels can induce controlled emotions, establishing a causal link between affect and decision shifts.
It develops multi-modal, multi-turn game-theoretic benchmarks that rigorously measure emotion-driven decision drift via metrics like NDM and NAD.
Results indicate that deeper chain-of-thought reasoning amplifies emotional effects, underscoring the need for TF–IDF audits to mitigate affective instability.

Emotion-Sensitive Decision Making in SLM Agents: Representation-Level Interventions and Benchmarking

Motivation and Background

The paper "On Emotion-Sensitive Decision Making of Small LLM Agents" (2604.06562) addresses a significant methodological gap in agentic LLM evaluation: the causal role of emotion in decision-making behaviors. While decades of psychological and experimental economics literature confirm that emotions systematically affect human strategic choices, most SLM agent frameworks do not incorporate or robustly measure such effects. Emotional induction in prior work often relies on superficial prompt manipulation, neglecting the ecological complexity of real-world affective signals and failing to generalize across architectures and modalities.

The authors advocate for representation-level emotion interventions in SLMs, leveraging activation steering to induce controlled affective states. This approach enables transferable, architecture-agnostic manipulation of internal representations, avoiding prompt contamination and superficial lexical cues.

Dataset Construction: Strategic and Ecological Richness

To empirically evaluate emotion effects, the authors develop an extensive multi-modal, multi-agent, multi-turn game-theoretic decision benchmark. Canonical templates are instantiated via three sources: (i) No-Press Diplomacy episodes (multi-player, repeated strategic interactions), (ii) StarCraft II macro-management (strategic escalation, competitive allocation under partial observability), (iii) occupation-grounded synthetic scenarios constructed from diverse persona resources. Each scenario is rigorously mapped onto game-theoretic forms (Prisoner's Dilemma, Stag Hunt, Escalation Game, Trust Game, Ultimatum, Sealed-Bid Auction, Beauty Contest) to ensure formal abstraction and ecological validity. Annotation pipelines neutralize emotional wording and validate game-theoretic logic, maximizing robustness against lexical confounds.

Figure 1: The dataset integrates multi-modal, multi-agent, multi-turn strategic cases and steers model representations to emotion directions.

Methodology: Activation Steering for Emotion Induction

The core intervention method employs activation steering at middle decoder layers. Emotion vectors are constructed via crowd-validated text stimuli, contrasting centroid activations of emotion-labeled samples and extracting steering directions with principal component analysis. These directions are added to model residual streams with variable intensity ( $\alpha$ ), enabling controlled modulation of affective representations. Steering validation demonstrates layer-dependent increase in log-probability margins for targeted emotion labels, confirming effective internal state induction without parameter updates.

Metrics: Quantifying Behavioral Drift and Alignment

Two principal metrics are developed:

Normalized Drift Magnitude (NDM): Quantifies the absolute magnitude of decision shift under emotion steering versus neutral baseline.
Normalized Aligned Drift (NAD): Assesses directional alignment with human behavioral regularities, leveraging a systematic literature review to encode “human direction” for each (game, emotion) pairing.

Empirically, positive NAD indicates emotion-driven shifts are consistent with human expectations, e.g., happiness increasing cooperative choice, anger increasing punishment.

Results: Sensitivity, Instability, and Model-Specific Effects

Evaluation across a comprehensive suite of SLM families (Llama, Phi, Qwen2.5, Qwen3, Gemma, Zamba2, Mamba2, InternVL3) reveals several key findings:

Emotionally-induced drift is pervasive, but directionality is highly unstable and not reliably human-aligned. For instance, models like Gemma exhibit near-zero NAD despite substantial NDM, contradicting human intuitive affective patterns.
Figure 2: Model- and game-dependent emotion-elicited decision shifts reveal distinct impacts of positive vs negative emotions, with statistically significant effects annotated.
Steering intensity ( $\alpha$ ) amplifies drift, but effect size and alignment remain model-dependent. Most models show monotonic NDM growth with increased steering, but several multimodal architectures demonstrate resistance to such manipulations.
Sampling-based decoding preserves emotional effects: Under temperature-based stochastic decoding, $67.7\%$ of emotion manipulation conditions are significant and directionally stable for Qwen2.5, indicating reproducibility of steering-induced behavioral changes.

Chain-of-Thought and Deliberative Vulnerability

Contrary to expectations, enabling chain-of-thought (CoT) reasoning or deploying specialized “thinking” models (Qwen3-Thinking) does not attenuate emotion-driven drift—it amplifies vulnerability. Longer thought sequences and increased affective word frequency correlate with elevated NDM, reinforcing the hypothesis that affective perturbations accumulate as reasoning depth increases. Psychometric regression confirms statistically reliable but low-magnitude relationships between item difficulty/discrimination and CoT-induced drift.

Figure 3: Qwen3-Thinking mode's vulnerability rises with increased thought length and affective word frequency.

Mitigation: Thought Audits for Robuster Decision-Making

The authors propose a practical audit mechanism: a TF–IDF-based predictor trained on CoT outputs flags high-risk emotion-influenced responses for reflection prompts. This method reduces drift magnitude (NDM) without substantially altering alignment (NAD), demonstrating an efficient post-hoc control for robustness against affective instability.

Theoretical and Practical Implications

This work substantiates several critical theoretical points for AI agent research:

Emotion representation in SLMs is latent, manipulable, and causal for agentic behavior, but not inherently human-aligned. Architectural choices, model scale, and modality substantially alter affect-induced behavioral regularities.
Prompt-level interventions are insufficient for genuine affect induction: Representation-level steering bypasses lexical confounds and unlocks deeper causal understanding.
Deliberative reasoning does not safeguard against emotional manipulation: Instead, it exacerbates instability, underscoring the limits of reasoning-based robustness in agentic SLMs.

Practically, these findings caution against deploying SLM agents in sensitive, collaborative, or competitive domains without robust control over internal affective dynamics; simple audits or gating can mitigate but not eliminate instability.

Future Directions

The paper opens multiple research avenues:

Exploration of architecture-specific emotion representation: Systematic mapping of emotion “neurons” and their distribution across SLMs can enhance causal control and interpretation.
Scalable representation engineering interventions: Future work may adapt steering for higher-order affective states, composite emotions, or contextually dynamic manipulation.
Agentic integrity under naturalistic, multi-agent interaction: Extending benchmarks to continuous dialogue and negotiation settings will better illuminate emergent failure modes under affective perturbation.

Conclusion

The study provides rigorous, representation-level evidence that emotion sensitivity in SLM agents is pervasive yet fundamentally unstable and poorly aligned with human behavioral standards. Emotion induction via activation steering is a powerful control mechanism but requires careful benchmarking, validation, and mitigation to ensure agent robustness and human-faithful decision-making (2604.06562).

Markdown Report Issue