In-Context Co-Player Inference

Updated 20 February 2026

In-context co-player inference is a framework that dynamically decodes co-player latent states, intentions, and strategies from ongoing interactions.
It combines Bayesian methods and transformer-based sequence models to update beliefs with each observed action, as demonstrated in games like Overcooked and language referential tasks.
Applications span MARL social dilemmas, cooperative language games, and ad hoc human-AI teaming, enhancing adaptation, joint performance, and robustness.

In-context co-player inference refers to the class of algorithms and frameworks that enable agents—artificial or human—to infer the latent states, beliefs, intentions, skills, or policies of their collaborators or opponents within the ongoing context of joint action or interaction. Rather than treating co-player behavior as static or exogenously specified, these approaches model co-players as adaptive agents whose actions, utterances, gaze, or other signals can be interpreted online to build context-dependent beliefs over their internal state, strategy, or type. This process enables robust cooperation, ad hoc teaming, and rapid adaptation in both structured MARL settings and naturalistic human–AI collaborations.

1. Formal Definitions and Core Problem Settings

Formally, the in-context co-player inference problem arises in multi-agent stochastic or partially observable environments, typically formulated as $N$ -agent partially observable stochastic games (POSGs) and multi-agent Markov decision processes (MDPs), as well as collaborative referential or language games.

Each agent $i$ maintains an episode context $x^i_{\leq t} = (o^i_1, a^i_1, r^i_1, \dots, o^i_t)$ , and its policy $\pi^i(a^i_t \mid x^i_{\le t}; \phi^i)$ must adapt in real time as evidence accumulates about the other agents’ unobserved goals, hidden private information, semantic/pragmatic type, or behavioral policy. In cooperative games, the core objective is to maximize joint performance (reward, score, utility), which often depends critically on identifying the current or future actions and intentions of one’s partner(s), given only partial observation of their behavior.

A canonical instance is inferring a partner’s current subtask in task-structured MARL (Lim et al., 2020), inferring their pragmatic background or model in two-player word games (Shaikh et al., 2023, Bills et al., 2024), or inferring their joint-goal representation in active-inference control (Maisto et al., 2022).

2. Bayesian and Sequence-Modeling Approaches

Early work uses explicit Bayesian inversion for in-context intention and team-structure inference. For example, in Overcooked-style collaboration, agents maintain beliefs over which subgoal $g$ their partner is pursuing by computing posteriors via Bayes’ rule:

$P(g \mid a_t, s_t) = \frac{P(a_t \mid g, s_t) P(g)}{\sum_{g'} P(a_t \mid g', s_t) P(g')}$

where $P(a_t \mid g, s_t)$ is instantiated with a softmax-noisy planner, itself scored by subtask rewards $R'(g)$ and path costs $c(g \mid s_t, a)$ (Lim et al., 2020). These beliefs are updated at every time step and drive downstream task allocation and conflict avoidance.

In Theory-of-Minds (ToM) frameworks, multi-agent team relationships are modeled as latent discrete structures (e.g., Composable Team Hierarchies), and Bayesian inference proceeds by scoring candidate hierarchies against observed joint trajectories using a Luce (softmax) choice rule over Q-values obtained by Monte Carlo tree search (Shum et al., 2019).

Transformer-based sequence models now perform in-context co-player inference at scale. A single causal transformer $M_\theta$ is trained to maximize the log likelihood of next actions conditioned on the agent’s entire interactive history with diverse opponent or co-player types. During deployment, in-context adaptation emerges solely from the evolving token history $h_{t-1}$ :

$\pi_{ICE}(a_t \mid I_t, H) = M_\theta(a_t\,|\,h_{t-1}, \mathrm{Tok}(I_t))$

with no parameter updates to $M_\theta$ at test time; opponent inference is entirely context-driven, leveraging the transformer’s latent state to encode information about co-player type, strategy, or idiosyncratic pattern (Li et al., 2024).

In decentralized MARL, recurrent sequence models (GRUs) are trained on mixed pools of learning and tabular agents, such that each agent must infer, in context and within-episode, which type of policy its partner is following—immediately adapting joint action choices accordingly (Weis et al., 18 Feb 2026).

3. In-Context Inference in Cooperative Action and Communication

In cooperative action domains, in-context inference pipelines typically instantiate some combination of the following modules:

State inference: Online Bayesian update of beliefs over co-player intentions, goals, skills, proficiency, or latent type (e.g., subtask in Overcooked, agent model in Codenames, semantic/progmatics in language games).
Behavioral prediction: Use of inferred beliefs to predict partner’s next move, and to coordinate or avoid redundant effort (e.g., avoiding overlapping subtasks (Lim et al., 2020), predicting visual reference in referential games (Wu et al., 2023)).
Active adaptation: Policies conditioned on in-context beliefs, including stochastic sampling, conflict-avoidance, or joint-planning routines.
Behavioral legibility: In active-inference and sensorimotor communication models, agents select their own actions to optimize not just pragmatic utility but epistemic value—the informativeness of their signals or movements for facilitating their partner’s inference (Maisto et al., 2022).

Multimodal pipelines integrate explicit human signals (gaze heatmaps, trajectory features) with game-state information, fusing these in causal transformers or RNNs to jointly infer attributes such as proficiency, trust, and intent (Hulle et al., 2024).

In language games, Bayesian and cognitive-hierarchy models infer both coarse and fine-grained uncertainty in partner types (semantic embedding, pragmatic reasoning level), updating a posterior over partner types at each observed cue (clue or guess), and using it to maximize expected utility on future turns (Bills et al., 2024).

4. Application Domains

In-context inference underpins the emergence of mutual cooperation in social-dilemma settings. When sequence-model agents are trained on a sufficiently diverse mix of co-player types, in-context best-response dynamics and mutual shaping pathways—previously requiring explicit meta-gradient methods—arise “for free” from sequential prediction. Mutual extortion leads to the evolution of robust cooperative equilibria, as agents become simultaneously aware and shapeable in context (Weis et al., 18 Feb 2026).

Cooperative Language, Pragmatics, and Reference

In complex referential or language games, pragmatic inference leverages both contextual messages and sociocultural priors. Fine-tuned sequence-to-sequence models augmented with player background features (demographics, personality, morality) enhance co-player prediction and task performance, as shown in Codenames Duet-based Cultural Codes (Shaikh et al., 2023). Bayesian agents in language games maintain and update distributions over partner models (word embeddings, levels of pragmatic reasoning), adapting their own communication and guessing strategies for improved cross-model cooperation (Bills et al., 2024).

Ad Hoc Teaming and Multimodal Human-AI Interaction

Active-inference–inspired frameworks enable agents to build co-player models using their own local observations, yielding latent portraits (perception, belief, action) for each teammate. These are filtered for accuracy and relevance before being integrated into local policy conditioning, fully supplanting explicit message passing and supporting robust ad hoc team play (Wu et al., 24 Nov 2025). In collaborative, eye-tracked environments, the fusion of behavior and gaze enables real-time prediction of co-player attributes (Hulle et al., 2024).

5. Mechanisms, Empirical Validation, and Scaling Considerations

The core mechanism underlying in-context co-player inference is rapid, context-dependent belief updating, which may unfold over a single observed action, several dialogue turns, or entire interaction windows. Sequence models with sufficient training on co-player diversity exhibit emergent best-response adaptation, while explicit Bayesian models enable fine-grained assignment of uncertainty to partner types, intentions, or abilities.

Empirically, incorporating in-context co-player inference yields significant gains in joint task performance, learning speed, and human-likeness of agent behavior:

Teams with ToM (inference-capable) agents consistently outperform non-inferential agents in Overcooked (Lim et al., 2020).
Sequence-model agents in repeated social dilemmas achieve >90% cooperation rates—whereas agents with access to explicit “opponent ID” signals or restricted to homogeneous training pools collapse to mutual defection (Weis et al., 18 Feb 2026).
In language games, Bayesian agents surpass static baselines by up to 20 points in cross-embedding settings under semantic uncertainty (Bills et al., 2024).
In ad hoc teaming and MARL benchmarks (SMAC, MPE, GRF), active-inference architectures incorporating in-context teammate portraits outperform both non-communication and message-passing baselines (Wu et al., 24 Nov 2025).

Ablation studies confirm the centrality of in-context inference modules: without real-time adaptation to co-player signals, performance consistently drops (e.g., 15–30% on SMAC maps).

6. Extensions, Limitations, and Future Directions

Major limitations include scalability to high-dimensional action or observation spaces, efficiency and stability of in-context adaptation in long-horizon or multi-agent settings, and the integration of explicit meta-reasoning or regret minimization atop implicit sequence-model adaptation.

Future extensions under active investigation include:

Improving sample efficiency via meta-gradient or meta-inference augmentation.
Scaling transformer-based in-context inference to competitive or mixed-motive games.
Integrating explicit regret-minimization, opponent-prediction heads, or explicit recursive pragmatic reasoning.
Expanding model classes to handle continuous control, richer sensory modalities (e.g., haptics, vision, language), and larger teams.
Leveraging cross-modal and sociocultural priors for robust generalization in zero- or few-shot human–AI teaming contexts.

System design recommendations arising from accumulated empirical and theoretical evidence emphasize incorporating shared agency (joint utility), common-ground modeling, real-time Bayesian or sequence-based inference, and shallow but robust pragmatic reasoning as the most effective and tractable path to scalable, human-aligned cooperative systems.

Table: Approaches to In-Context Co-Player Inference (selected references)

Domain	Principal Method(s)	Representative Papers
MARL / Social Dilemmas	Recurrent/transformer seq models, PPI, A2C	(Weis et al., 18 Feb 2026, Li et al., 2024)
Cooperative Action (Overcooked)	Bayesian ToM + inverse planning	(Lim et al., 2020)
Multi-Agent Inverse Planning	Bayesian inference over team structure	(Shum et al., 2019)
Active Inference in Joint Action	Variational free-energy minimization + legibility	(Maisto et al., 2022, Wu et al., 24 Nov 2025)
Language / Reference Games	Pragmatic/Bayesian seq2seq, model type inference	(Shaikh et al., 2023, Bills et al., 2024, Cope et al., 2023)
Multimodal Human-AI Teaming	Transformer fusion of behavioral trajectories/gaze	(Hulle et al., 2024)
Image–Text Referential Games	CLIPScore-augmented transformer listener	(Wu et al., 2023)

In sum, in-context co-player inference is central to next-generation MARL, ad hoc teaming, and human–AI cooperation, providing the computational machinery for agents to dynamically model, predict, and adapt to the latent states and strategies of their partners and opponents.