- The paper introduces a novel embodied referential game where agents use joint actuation to communicate, enabling zero-shot coordination without prior interaction.
- The methodology employs energy-based regularization coupled with a Zipf distribution over intents to minimize energy costs and structure non-symbolic communication.
- Experiments demonstrate that providing latent energy values to observer models boosts coordination, though challenges persist in high-dimensional action spaces and large intent sets.
Exploring Zero-Shot Emergent Communication in Embodied Multi-Agent Populations
Introduction
The paper presents an exploration into the emergent communication within embodied multi-agent systems using a non-symbolic communication channel. This communication is facilitated by the actuation of joints in a 3D environment as opposed to traditional symbolic channels. The study focuses on zero-shot (ZS) coordination, where agents, without prior interaction, must communicate effectively. The realistic assumptions incorporated include a non-uniform distribution of communicative intents and an energy cost associated with physical actuation, crucial for developing communication protocols that can generalize to interactions with novel partners.
Methodology
The authors develop an embodied referential game where agents communicate intents through multi-step joint actuation in a simulated 3D world. The embodied communication is modeled as a decentralized partially observable Markov Decision Process (POMDP), assessed under centralized training regimes and focused on maximizing communicative success.
Induction of Implicit Structure
For fostering protocol generalization, the study proposes an energy-based regularization coupled with a Zipf distribution over intents. This aims to minimize energy exertion, prioritizing frequent intents with low-energy trajectories—a mechanism speculated to enable ZS coordination by ordering intents inversely to energy exertion.

Figure 1: Outcomes of employing a Zipf distribution alongside energy-based regularization, highlighting distinct communication protocols.
Evaluation and Training
The authors evaluate the efficacy of emergent communication using third-party observers that train on a subset of agents and test on previously unseen partner groups. Additionally, a curriculum-based approach pre-trains agents to minimize energy across intents, hypothesizing that this initial uniform energy minimization aids the optimization algorithm in later training phases.
Experiments and Results
Experiments demonstrated that introducing an implicit energy-based latent structure enhances ZS coordination yet remains computationally challenging. The inherent complexity arises from navigating high-dimensional action spaces, a process exacerbated by the need to strictly order energy values for intents.

Figure 2: Continuous trajectory generated by agents showing distinct motions for varying intents.
Further investigations underscore that ZS coordination is significantly boosted when latent energy values are provided to the observer model. However, in larger intent sets, complexities amplify due to increased combinations of energy-intent orderings, leading to diminished performance without curriculum-assisted standardization.
Conclusions
This study opens pathways for further explorations into continuous actuation-based communication in embodied agents, highlighting ZS coordination potentials and limitations. Future work involves refining curriculum strategies and optimization methods to better align agent behaviors and facilitate robust protocol generalization.
The implications are substantial for areas such as social robotics and human-robot interaction where non-verbal communication channels could be pivotal. Enhanced ZS coordination might allow robots to seamlessly integrate and operate in human-centric environments without exhaustive retraining.