Generalizability of ToM-like Emergence Across Model Architectures

Determine whether the emergence of Theory-of-Mind-like behavior observed in memory-equipped large language model poker agents generalizes to language model architectures beyond Anthropic Claude.

Background

The study reports that autonomous LLM agents (Claude Sonnet) develop Theory-of-Mind-like opponent models during extended Texas Hold'em poker sessions, contingent on the presence of persistent memory. This behavior includes predictive and recursive modeling and supports strategic deception, suggesting an emergent social cognition capability driven by interaction dynamics.

However, all poker-playing agents in the experiments were instances of the same model family (Anthropic Claude). Although the authors cross-validated their Theory-of-Mind level coding with GPT-4o to assess annotation reliability, the generative behavior underpinning ToM-like emergence was only evaluated in Claude-based agents. The authors therefore note that it remains an open question whether such emergence would occur in other model families or architectures.

References

While cross-model validation of ToM coding with GPT-4o yielded high agreement ($\kappa = 0.81$), the generalizability of ToM-like behavior emergence to other model architectures remains an open question.

Readable Minds: Emergent Theory-of-Mind-Like Behavior in LLM Poker Agents  (2604.04157 - Lin et al., 5 Apr 2026) in Discussion, Limitations subsection