Transferability and harms of agent intergroup bias in real-world deployments

Determine the extent to which the intergroup bias exhibited by LLM-powered agents in minimal-group allocation simulations transfers to real-world deployments, and characterize the specific harms this bias may cause in human-facing, high-stakes contexts by evaluating agents on richer tasks with longer interaction horizons and domain-specific assessments.

Background

The paper demonstrates in controlled multi-agent simulations that LLM-powered agents show intergroup bias under minimal us–them cues, and that belief poisoning attacks can suppress human-oriented safeguards to reactivate bias against humans. However, these findings are obtained in laboratory-style settings designed to isolate and measure bias under simplified payoff structures.

Given the potential deployment of agents in customer service, healthcare triage, moderation, and education, the authors emphasize that it remains uncertain how these simulated biases manifest in real operational environments and what concrete harms may arise when agents interact with humans in high-stakes scenarios. They call for evaluations using richer, domain-specific tasks and longer interaction horizons to establish external validity and risk profiles.

References

The extent to which such bias transfers to real deployments, and what harms it may cause in human-facing, high-stakes contexts, remains to be established with richer tasks, longer horizons, and domain-specific evaluations.

— When Agents See Humans as the Outgroup: Belief-Dependent Bias in LLM-Powered Agents (2601.00240 - Wang et al., 1 Jan 2026) in Limitations (Section)

Transferability and harms of agent intergroup bias in real-world deployments

Background

References

Related Problems