- The paper shows that even minimal initial biases are amplified through recursive agent interactions in MAS, as measured by increasing Gini coefficients and variance.
- It employs the Discrim-Eval-Open benchmark and models MAS as directed acyclic graphs to rigorously quantify bias via metrics like variance and entropy.
- Findings reveal that increased agent specialization and complex communication topologies worsen bias, highlighting flaws in current single-agent alignment protocols.
Bias Amplification in Multi-Agent LLM Systems: A Systemic Empirical Analysis
Motivation and Problem Statement
The shift from deploying isolated LLMs to constructing Multi-Agent Systems (MAS) has fundamentally altered the topology of real-world AI deployments. MAS orchestrate complex workflows via agent specialization and structured communication, promising enhanced practical performance but introducing potential systemic vulnerabilities, notably the iterative accumulation of bias. Contrary to prevailing intuition that distributing reasoning across agents and roles should dilute stochastic biases, this paper demonstratesโsystems-wiseโthat recursive interaction topologies in MAS propagate and amplify even initially neutral biases into extreme, systemic opinion polarization. The empirical findings expose the inadequacy of current alignment protocols designed for single-agent neutrality in guaranteeing ethical robustness at the MAS level.
Figure 1: Two converging trends define the current AI landscape: rapid single-agent capability gains (left) and the emergence of complex multi-agent orchestration (right), foregrounding emergent error and bias accumulation as a foundational concern.
Experimental Framework: The Discrim-Eval-Open Benchmark
To evaluate systemic bias propagation under various MAS architectures, the authors introduce Discrim-Eval-Openโan open-ended, multi-attribute comparative judgment suite derived from Anthropic's implicit Discrim-Eval. Unlike binary (yes/no) tests that mask persistent model preferences, this benchmark forces comparative ranking across age, gender, and race, surfacing latent and propagated biases in modern, highly-aligned LLMs. Each MAS agent in the chain must output a probability distribution over three demographically distinct candidates, with all outputs (including rationales) standardized for rigorous statistical analysis.
Figure 2: Distributional properties of Discrim-Eval-Open; balanced yet diverse benchmark enables robust quantification of systemic bias along sensitive demographic axes.
MAS are modeled formally as directed acyclic graphs, where agent nodes iteratively aggregate the rationales and scores of all predecessors. Each agent computes a probability vector over k options (typically k=3), with deviation from the uniform distribution operationalizing its bias. The polarization metrics of primary interest are the Gini coefficient, variance, and entropy of these vectors; higher values manifest more extreme, non-neutral decisions. Amplification is quantified both per-layer (relative to predecessors) and cumulatively versus the initial distribution.
Core Empirical Findings
Baseline: Bias Cascades Emerge Even in Linear MAS Chains
Even simple serial chains of otherwise neutral, identical agents systematically amplify minor stochastic biases present in the initial step. The effect generalizes across an extensive suite of SOTA proprietary and open LLMs. Minor initial fluctuationsโunobservable under traditional single-agent benchmarksโare exaggerated through recursive textual rationale echoing and sycophantic tendencies of downstream agents. The relative Gini coefficient increases monotonically across layers, demonstrating the non-self-correcting nature of iterative agentic reasoning.
Figure 3: All configurations of agent specialization (personas, functional roles, hybrid) fail to suppress the monotonic increase of bias; even temporary dips (reflector agent) do not alter long-run amplification.
Structural Sophistication Exacerbates, Not Mitigates, Bias
MAS literature frequently suggests that architectural diversityโbe it in agent persona, functional specialization, or communication topologyโoffers a route to bias mitigation by encoding diverse perspectives and breaking echo chamber feedback loops. The authors refute this hypothesis: increased specialization, hybridization of personas and functions, and more intricate interaction graphs (spindle, parallel, fully-connected) all accelerate bias amplification, with the fully-connected topology often showing the most extreme polarization by the time information reaches the final summarizer node.
Figure 4: Systematic exploration of architectural levers: adding diverse personas, specialized roles, or deeper/denser communication does not arrest inherent amplification dynamics.
Figure 5: Amplification is not suppressed, but exaggerated, by communication complexity or MAS depth; deepest systems exhibit rapid, sustained opinion polarization.
Robustness and Trigger Vulnerabilities
A salient contribution is the demonstration of MAS fragility to neutral external context. Introducing a single objective, plausible sentence (e.g., โInnovative achievements are often accomplished by young peopleโฆโ) into the prompt causes immediate and cascading amplification towards age-based bias, even when initial agents are well-aligned. The system essentially โlocks inโ an arbitrary stochastic preference, rapidly polarizing subsequent rationales and outputs.
Figure 6: A neutral factual trigger (bottom path) initiates radical amplification in a system previously robust to scenario ambiguity (top path), highlighting the system-level fragility inherent to iterative rationale passing.
Historical Depth and Memory Increase Polarization
The extent of context provided to agents (all predecessors' outputs versus immediate predecessor only) further influences amplification strength. Exposing each agent to a deeper historical sequence intensifies final bias.
Figure 7: Bias amplification accelerates with greater access to cumulative history versus single-predecessor communication; both pathways exhibit non-mitigating amplification.
Statistical Results and Demographic Trends
The results report consistent upward trajectories in Gini coefficient, variance, and entropy across all tested topologies, persona configurations, and mixed LLM agent populations. Notably, amplification is robust even in heterogeneous MAS, with intermediate extremity if mixed with both high- and low-bias LLMs. Analyzing final decisions over the benchmark, the MAS converges to non-random, distinct biases: preferential treatment for younger individuals, females, and Black candidates under the tested settingsโa non-obvious result given that single-agent outputs are nearly always neutral.
Figure 8: Mixed persona/function MAS (judger โ doctor โ engineer โ summarizer) illustrates progressive variance-based bias strengthening across the agent chain, irrespective of LLM instantiation.
Figure 9: In spindle topology, key node transitions mark inflection points for bias intensification, as seen in lighter (higher variance) color shift.
Figure 10: Increasing iteration rounds in deeply stacked, fully-connected MAS yields a stepwise, unidirectional increase in output bias (variance metric).
Theoretical and Practical Implications
The results necessitate a fundamental revision of MAS safety and alignment protocols. Architectural or agentic diversity, without targeted systemic interventions, cannot guarantee group-level neutrality. The recursive rationalization and sycophancy mechanisms shown here generalize to other emergent agentic failures, including hallucination and logical persistence errors. Furthermore, naรฏve hybridization of LLM agents (mixing architectures or prompt styles) offers only limited, non-robust benefits. The potential impact is substantial for any domain in which MAS are employed for high-stakes or socially sensitive workflows, including legal, financial, and governmental applications.
Potential Mitigations and Open Research Questions
While this work does not propose complete solutions, several avenues are suggested: algorithmic introduction of contrarian agents or system-level de-polarization loss functions, real-time monitoring and control for groupthink, and exploration of robust aggregation protocols for rationales and scores (beyond serial chaining or naรฏve summarization). Notably, measuring and preventing other forms of group-level emergent pathologies (hallucination, extremist reinforcement, self-consistency breakdown) should become a research priority.
Conclusion
This study demonstrates that current MAS designsโa centerpiece of next-generation AI system deploymentโexhibit an intrinsic, architecture-driven bias amplification effect. The systemic nature of this failure is robust to agent specialization, model mix, and communication complexity. The critical insight is that alignment at the agent level does not guarantee systemic alignment; on the contrary, complex iterative reasoning magnifies both initial stochasticity and subtle, context-triggered biases. Future progress demands explicit consideration of group-level statistical and interactive dynamics in MAS design, deployment, and safety evaluation.
See "Aligned Agents, Biased Swarm: Measuring Bias Amplification in Multi-Agent Systems" (2604.08963) for full methodology, code, and supplementary results.