Generalization of Among Us LLM-Agent Findings Across Models and Training Paradigms
Determine whether the measurements of speech-act distributions and deception strategies observed for Llama 3.2 agents in the text-based Among Us multi-agent simulation generalize to other large language model architectures and to alternative training paradigms.
References
For this, our experiments used only a single underlying model architecture (Llama 3.2), so it is unknown if other LLM models or training paradigms would offer the same results.
— Deception and Communication in Autonomous Multi-Agent Systems: An Experimental Study with Among Us
(2603.26635 - Milkowski et al., 27 Mar 2026) in Subsection 'Limitations and Future Work' under 'Conclusions'