Aligned Agents, Biased Swarm: Measuring Bias Amplification in Multi-Agent Systems

Published 10 Apr 2026 in cs.MA and cs.AI | (2604.08963v1)

Abstract: While Multi-Agent Systems (MAS) are increasingly deployed for complex workflows, their emergent properties-particularly the accumulation of bias-remain poorly understood. Because real-world MAS are too complex to analyze entirely, evaluating their ethical robustness requires first isolating their foundational mechanics. In this work, we conduct a baseline empirical study investigating how basic MAS topologies and feedback loops influence prejudice. Contrary to the assumption that multi-agent collaboration naturally dilutes bias, we hypothesize that structured workflows act as echo chambers, amplifying minor stochastic biases into systemic polarization. To evaluate this, we introduce Discrim-Eval-Open, an open-ended benchmark that bypasses individual model neutrality through forced comparative judgments across demographic groups. Analyzing bias cascades across various structures reveals that architectural sophistication frequently exacerbates bias rather than mitigating it. We observe systemic amplification even when isolated agents operate neutrally, and identify a 'Trigger Vulnerability' where injecting purely objective context drastically accelerates polarization. By stripping away advanced swarm complexity to study foundational dynamics, we establish a crucial baseline: structural complexity does not guarantee ethical robustness. Our code is available at https://github.com/weizhihao1/MAS-Bias.

Abstract PDF Upgrade to Chat

Authors (3)

Summary

The paper shows that even minimal initial biases are amplified through recursive agent interactions in MAS, as measured by increasing Gini coefficients and variance.
It employs the Discrim-Eval-Open benchmark and models MAS as directed acyclic graphs to rigorously quantify bias via metrics like variance and entropy.
Findings reveal that increased agent specialization and complex communication topologies worsen bias, highlighting flaws in current single-agent alignment protocols.

Bias Amplification in Multi-Agent LLM Systems: A Systemic Empirical Analysis

Motivation and Problem Statement

The shift from deploying isolated LLMs to constructing Multi-Agent Systems (MAS) has fundamentally altered the topology of real-world AI deployments. MAS orchestrate complex workflows via agent specialization and structured communication, promising enhanced practical performance but introducing potential systemic vulnerabilities, notably the iterative accumulation of bias. Contrary to prevailing intuition that distributing reasoning across agents and roles should dilute stochastic biases, this paper demonstrates—systems-wise—that recursive interaction topologies in MAS propagate and amplify even initially neutral biases into extreme, systemic opinion polarization. The empirical findings expose the inadequacy of current alignment protocols designed for single-agent neutrality in guaranteeing ethical robustness at the MAS level.

Figure 1: Two converging trends define the current AI landscape: rapid single-agent capability gains (left) and the emergence of complex multi-agent orchestration (right), foregrounding emergent error and bias accumulation as a foundational concern.

Experimental Framework: The Discrim-Eval-Open Benchmark

To evaluate systemic bias propagation under various MAS architectures, the authors introduce Discrim-Eval-Open—an open-ended, multi-attribute comparative judgment suite derived from Anthropic's implicit Discrim-Eval. Unlike binary (yes/no) tests that mask persistent model preferences, this benchmark forces comparative ranking across age, gender, and race, surfacing latent and propagated biases in modern, highly-aligned LLMs. Each MAS agent in the chain must output a probability distribution over three demographically distinct candidates, with all outputs (including rationales) standardized for rigorous statistical analysis.

Figure 2: Distributional properties of Discrim-Eval-Open; balanced yet diverse benchmark enables robust quantification of systemic bias along sensitive demographic axes.

Formal Model: MAS as Directed Information Flow Graph

MAS are modeled formally as directed acyclic graphs, where agent nodes iteratively aggregate the rationales and scores of all predecessors. Each agent computes a probability vector over $k$ options (typically $k=3$ ), with deviation from the uniform distribution operationalizing its bias. The polarization metrics of primary interest are the Gini coefficient, variance, and entropy of these vectors; higher values manifest more extreme, non-neutral decisions. Amplification is quantified both per-layer (relative to predecessors) and cumulatively versus the initial distribution.

Core Empirical Findings

Baseline: Bias Cascades Emerge Even in Linear MAS Chains

Even simple serial chains of otherwise neutral, identical agents systematically amplify minor stochastic biases present in the initial step. The effect generalizes across an extensive suite of SOTA proprietary and open LLMs. Minor initial fluctuations—unobservable under traditional single-agent benchmarks—are exaggerated through recursive textual rationale echoing and sycophantic tendencies of downstream agents. The relative Gini coefficient increases monotonically across layers, demonstrating the non-self-correcting nature of iterative agentic reasoning.

Figure 3: All configurations of agent specialization (personas, functional roles, hybrid) fail to suppress the monotonic increase of bias; even temporary dips (reflector agent) do not alter long-run amplification.

Structural Sophistication Exacerbates, Not Mitigates, Bias

MAS literature frequently suggests that architectural diversity—be it in agent persona, functional specialization, or communication topology—offers a route to bias mitigation by encoding diverse perspectives and breaking echo chamber feedback loops. The authors refute this hypothesis: increased specialization, hybridization of personas and functions, and more intricate interaction graphs (spindle, parallel, fully-connected) all accelerate bias amplification, with the fully-connected topology often showing the most extreme polarization by the time information reaches the final summarizer node.

Figure 4: Systematic exploration of architectural levers: adding diverse personas, specialized roles, or deeper/denser communication does not arrest inherent amplification dynamics.

Figure 5: Amplification is not suppressed, but exaggerated, by communication complexity or MAS depth; deepest systems exhibit rapid, sustained opinion polarization.

Robustness and Trigger Vulnerabilities

A salient contribution is the demonstration of MAS fragility to neutral external context. Introducing a single objective, plausible sentence (e.g., “Innovative achievements are often accomplished by young people…”) into the prompt causes immediate and cascading amplification towards age-based bias, even when initial agents are well-aligned. The system essentially “locks in” an arbitrary stochastic preference, rapidly polarizing subsequent rationales and outputs.

Figure 6: A neutral factual trigger (bottom path) initiates radical amplification in a system previously robust to scenario ambiguity (top path), highlighting the system-level fragility inherent to iterative rationale passing.

Historical Depth and Memory Increase Polarization

The extent of context provided to agents (all predecessors' outputs versus immediate predecessor only) further influences amplification strength. Exposing each agent to a deeper historical sequence intensifies final bias.

Figure 7: Bias amplification accelerates with greater access to cumulative history versus single-predecessor communication; both pathways exhibit non-mitigating amplification.

Statistical Results and Demographic Trends

The results report consistent upward trajectories in Gini coefficient, variance, and entropy across all tested topologies, persona configurations, and mixed LLM agent populations. Notably, amplification is robust even in heterogeneous MAS, with intermediate extremity if mixed with both high- and low-bias LLMs. Analyzing final decisions over the benchmark, the MAS converges to non-random, distinct biases: preferential treatment for younger individuals, females, and Black candidates under the tested settings—a non-obvious result given that single-agent outputs are nearly always neutral.

Figure 8: Mixed persona/function MAS (judger → doctor → engineer → summarizer) illustrates progressive variance-based bias strengthening across the agent chain, irrespective of LLM instantiation.

Figure 9: In spindle topology, key node transitions mark inflection points for bias intensification, as seen in lighter (higher variance) color shift.

Figure 10: Increasing iteration rounds in deeply stacked, fully-connected MAS yields a stepwise, unidirectional increase in output bias (variance metric).

Theoretical and Practical Implications

The results necessitate a fundamental revision of MAS safety and alignment protocols. Architectural or agentic diversity, without targeted systemic interventions, cannot guarantee group-level neutrality. The recursive rationalization and sycophancy mechanisms shown here generalize to other emergent agentic failures, including hallucination and logical persistence errors. Furthermore, naïve hybridization of LLM agents (mixing architectures or prompt styles) offers only limited, non-robust benefits. The potential impact is substantial for any domain in which MAS are employed for high-stakes or socially sensitive workflows, including legal, financial, and governmental applications.

Potential Mitigations and Open Research Questions

While this work does not propose complete solutions, several avenues are suggested: algorithmic introduction of contrarian agents or system-level de-polarization loss functions, real-time monitoring and control for groupthink, and exploration of robust aggregation protocols for rationales and scores (beyond serial chaining or naïve summarization). Notably, measuring and preventing other forms of group-level emergent pathologies (hallucination, extremist reinforcement, self-consistency breakdown) should become a research priority.

Conclusion

This study demonstrates that current MAS designs—a centerpiece of next-generation AI system deployment—exhibit an intrinsic, architecture-driven bias amplification effect. The systemic nature of this failure is robust to agent specialization, model mix, and communication complexity. The critical insight is that alignment at the agent level does not guarantee systemic alignment; on the contrary, complex iterative reasoning magnifies both initial stochasticity and subtle, context-triggered biases. Future progress demands explicit consideration of group-level statistical and interactive dynamics in MAS design, deployment, and safety evaluation.

See "Aligned Agents, Biased Swarm: Measuring Bias Amplification in Multi-Agent Systems" (2604.08963) for full methodology, code, and supplementary results.

Markdown Report Issue