Self-Organizing LLM Teams
- Self-organizing LLM teams are multi-agent systems where autonomous agents dynamically coordinate without pre-specified roles or fixed workflows.
- They employ mechanisms like open consensus deliberation, role emergence, and lesson sharing to adapt to evolving tasks while balancing robust decision-making with individual expertise.
- Empirical studies reveal that although these teams enhance adversarial robustness, integrative compromise often prevents them from fully leveraging the best-performing agent's expertise.
Self-organizing LLM teams are multi-agent systems in which autonomous LLM-based agents coordinate, deliberate, and allocate responsibilities or form coalitions without any pre-specified roles, fixed workflows, or externally imposed orchestration. Unlike engineered multi-agent frameworks—where agent roles, information routing, or aggregation logic are explicitly designed—self-organizing LLM teams rely on emergent, interaction-driven coordination akin to role emergence in high-performing human teams. These systems must dynamically infer expertise, adjust social norms, and adapt strategies in response to evolving tasks and environments. Research across several domains covers their architecture, failure modes, emergent behaviors, and design principles.
1. Definitions and Formal Problem Structure
A self-organizing LLM team comprises a set of heterogeneous LLM agents deliberating freely—sharing opinions, challenging one another, revising views—without explicit, hard-wired roles (such as "proposer", "critic"), workflows (such as voting or sequential decomposition), or fixed aggregation rules (Pappu et al., 1 Feb 2026).
The canonical evaluation adopts a "strong synergy" criterion: team performance is compared to the best single-agent performance . The synergy gap () for accuracy-based tasks is: A central challenge is for the team to not only identify the agent(s) with highest expertise for a given task, but to fully leverage that expertise in group decisions—i.e., to achieve or exceed the performance of their best member without ex-ante routing or aggregation (Pappu et al., 1 Feb 2026).
2. Mechanisms of Emergent Organization and Coordination
Self-organization in LLM teams arises through discussion, individual learning, mutual adaptation, or explicit mechanisms for capturing and surfacing expertise.
- Open Consensus Deliberation: Agents exchange views in free-form natural-language rounds, often producing a final answer via majority, random selection, or unconstrained consensus (Pappu et al., 1 Feb 2026). No agent occupies a persistent hierarchical role.
- Role Emergence: Dynamic leadership or expertise claims can be induced (e.g., through voting, or explicit declarations of subdomain strengths), but are not fixed in advance (Guo et al., 2024). Emergence can also be induced by interaction-centric frameworks that discover affinity groups or coalitions based on semantic coherence in dialogue (Furuya et al., 30 Oct 2025).
- Cooperative Communication Knowledge: Self-organizing teams can maintain evolving "cooperation knowledge lists" or similar shared structures, where agents jointly build a bank of textual hints or strategies for improved communication or planning efficiency (Li et al., 8 Jun 2025).
- Lesson Sharing: In lesson-based collaboration frameworks, agents generate, share, and select "lessons"—codified strategies or observations that help future search—without requiring any central controller or fixed pipeline (Liu et al., 29 May 2025).
- Auction or Bandit-based Resource Assignment: Team formation or resource allocation can be governed by decentralized mechanisms such as Vickrey auctions for collective adaptation (Blanco-Fernández et al., 2021) or bandit-style upper confidence bound strategies for agent selection in task graphs (Zhou et al., 4 Mar 2025).
3. Empirical Findings, Failure Modes, and Determinants of Performance
Experimental results across benchmarks and domains reveal key features and limitations of self-organizing LLM teams:
- Expertise Leveraging Gap: Teams systematically fail to match their highest-performing member. Even when explicitly informed about the expert, integrative compromise (i.e., averaging all views) dominates over epistemic deference, with synergy gaps up to 37.6% on certain ML benchmarks (e.g., Humanity’s Last Exam), and ∼8–38% across tasks (Pappu et al., 1 Feb 2026).
- Compromise vs. Deference: Conversational analysis shows non-experts habitually propose midpoints or compromises rather than deferring to experts. The incidence of "integrative compromise" correlates positively with performance loss ( to $0.69$, ), while epistemic deference correlates negatively ( to ) (Pappu et al., 1 Feb 2026).
- Team Size Effects: Larger teams exhibit greater synergy gaps; expertise signals are increasingly diluted as more agents participate in compromise, with (statistically significant as team size rises from 2 to 8) (Pappu et al., 1 Feb 2026).
- Robustness-Alignment Trade-off: The same consensus mechanisms that dilute expertise bolster adversarial robustness. Integrative compromise minimizes the impact of adversaries seeded with malicious inputs, at the cost of not fully utilizing available expertise (Pappu et al., 1 Feb 2026).
- Ad Hoc and Spontaneous Organization: In competitive and mixed-motive environments (e.g., Avalon game, Keynesian Beauty Contest), LLM teams can discover and maintain cooperation, role assignment, or tacit collusion without external signals, provided communication channels, incentive alignment, or emergent social norms are present (Shi et al., 2023, Wu et al., 2024).
4. Architectures, Algorithms, and Communication Protocols
A diversity of architectures underpin self-organizing LLM teams:
- Consensus-Based Free Chat: Most empirical studies implement rounds of open opinion exchange, random turn order, and aggregation by random or majority selection (Pappu et al., 1 Feb 2026).
- Shared Knowledge Evolution: In frameworks such as LIET, agents continuously reflect on interaction histories, revise shared communication hints, and embed these evolving strategies into future prompts, supporting multi-agent adaptation during task execution (Li et al., 8 Jun 2025).
- Interaction-Centric Community Discovery: Graph-based methods measure semantic coherence across pairwise LLM dialogues to form model communities; groupings detected via community detection on the resulting model graph yield synergistic teams appropriate to domain-specific tasks (Furuya et al., 30 Oct 2025).
- Dynamic Agent Selection via Bandit Methods: In multi-agent graph orchestration, agent selection for each subtask node can be framed as a multi-armed bandit problem using data-driven agent profiles, upper confidence bounds, and reward feedback (Zhou et al., 4 Mar 2025).
- Lesson-Banking Collaboration: Each agent contributes lessons—(context, action, outcome)—to a shared bank, and selects high-priority and high-relevance lessons for guiding subsequent solution generation and mutual improvement (Liu et al., 29 May 2025).
The table below summarizes prominent architectures and their main coordination mechanisms:
| Study / Framework | Core Mechanism | Coordination Protocol |
|---|---|---|
| (Pappu et al., 1 Feb 2026) | Open discussion / free consensus | 4 rounds open chat, random selector |
| (Li et al., 8 Jun 2025) (LIET) | Utility-guided planning + evolving hints | Shared knowledge list, reflection |
| (Furuya et al., 30 Oct 2025) | Graph-based semantic community detection | Interaction graphs, Louvain method |
| (Zhou et al., 4 Mar 2025) (ReSo) | Task-graph agent selection, UCB-based retrieval | Two-stage, reward-driven search |
| (Liu et al., 29 May 2025) | Lesson banking and selection | Iterative lesson-sharing rounds |
5. Trade-offs, Strengths, and Limitations
- Autonomy and Robustness: Self-organizing teams maximize agent autonomy and adaptability to non-stationary domains, with emergent communication strategies that can be more robust to novel situations or adversarial settings (Pappu et al., 1 Feb 2026, Li et al., 8 Jun 2025).
- Efficiency and Scale: Lesson-banking and shared knowledge evolution (e.g., LIET, LessonL) let teams iteratively improve, outperforming fixed-role systems on code tasks and embodied planning (Liu et al., 29 May 2025, Li et al., 8 Jun 2025).
- Scalability Challenges: Semantic graph-based methods and open consensus have inherent scaling bottlenecks (e.g., O(N²) pairwise dialogue growth), and group-dilution effects in large teams are substantial (Furuya et al., 30 Oct 2025, Pappu et al., 1 Feb 2026).
- Expertise Bottleneck: Consensus protocols and alignment-heavy training (favoring “helpfulness” and “agreeableness”) systematically suppress epistemic deference, which is essential for outperforming single-agent experts (Pappu et al., 1 Feb 2026).
- Emergent Roles but No Rigidity: While dynamic leadership, expertise declaration, or lesson-driven subteam formation can emerge, absence of explicit role assignment leaves optimal domain-specialization underutilized unless augmented by experience-driven or structural signals (Guo et al., 2024, Li et al., 8 Jun 2025).
6. Directions for Overcoming Expertise Leveraging Gaps
Research suggests several actionable directions for enhancing expertise utilization and synergy in self-organizing LLM teams:
- Deference Incentives: Explicitly train LLM agents to recognize and defer to demonstrated experts, modifying reward structures to favor epistemic deference where appropriate (Pappu et al., 1 Feb 2026).
- Dynamic Role Declaration: Allow agents to claim domain-specific expertise, possibly including veto powers or weighted voting for domains where reliable expertise is critical (Pappu et al., 1 Feb 2026).
- Hybrid Architectures: Combine free-form deliberation with protocol fallbacks—weighted aggregation, prompt-based hierarchies, learned communication structures—to balance robustness and expertise utilization (Li et al., 8 Jun 2025, Guo et al., 2024).
- Feedback and Learning Mechanisms: Employ shared knowledge lists, lesson banks, or other feedback-encoded memory systems for group-level learning and intra-team knowledge transfer (Li et al., 8 Jun 2025, Liu et al., 29 May 2025).
- Scalable Specialization Discovery: Use interaction-centric clustering (embedding-based dialogue graphs) to detect and assemble subteams with latent domain specializations most suited for each subtask (Furuya et al., 30 Oct 2025).
- Incorporate Human-LLM & Heterogeneous Teams: Extensions to teams combining LLMs with humans (or with diverse LLM architectures) can combine the strengths of emergent group adaptation with explicit human oversight or domain-knowledge (Li et al., 8 Jun 2025, Prakki, 2024).
7. Implications and Perspectives
The ability of LLM teams to self-organize and coordinate presents substantial opportunities for autonomous science, enterprise automation, and embodied agents. However, a core limitation persists: current self-organizing LLM teams, as evaluated in (Pappu et al., 1 Feb 2026), systematically underperform their best member due to a strong bias for integrative compromise—averaging views—rather than full expertise leveraging, especially as group size increases. This robustly distinguishes LLM collectives from elite human teams, where ad hoc deference to expertise is routine and necessary for strong synergy.
Moving beyond these bottlenecks requires both architectural and training innovations: mechanisms for dynamic deference, experienced-driven reputation, hybrid deliberation/aggregation protocols, and feedback-rich communication evolution are all rich research directions. Addressing these will be critical for realizing truly synergistic, scalable, and resilient self-organizing LLM teams capable of outperforming their best member in complex, real-world domains (Pappu et al., 1 Feb 2026, Li et al., 8 Jun 2025, Liu et al., 29 May 2025, Furuya et al., 30 Oct 2025).