Develop training procedures for contextual expertise leveraging without sacrificing robustness

Develop training procedures for large language models used in self-organizing multi-agent teams that enable contextual expertise leveraging—meaning appropriate deference to identified expert agents during deliberation—while preserving robustness to adversarial team members and manipulative inputs. The goal is to achieve strong synergy (matching or exceeding the best individual agent) without losing the consensus-seeking protection against adversarial influence observed in current RLHF-aligned models.

Background

The paper shows that self-organizing multi-agent LLM teams consistently fail to achieve strong synergy: they underperform their best individual member across human-inspired tasks and modern ML benchmarks. Controlled ablations indicate that the primary bottleneck is leveraging known expertise rather than identifying it.

Conversational analysis reveals that teams default to integrative compromise—averaging expert and non-expert views—rather than epistemic deference to recognized expertise, which harms performance. However, this consensus-seeking behavior also provides robustness against adversarial agents, suggesting a trade-off between expertise utilization and manipulation resistance.

The authors conclude that addressing this trade-off likely requires changes in training procedures beyond prompt engineering or workflow design, motivating an explicit open challenge to develop training methods that enable context-appropriate expertise leveraging while maintaining robustness.

References

While this provides robustness to adversarial input, developing training procedures that enable contextual expertise leveraging without sacrificing robustness remains an open challenge.

Multi-Agent Teams Hold Experts Back  (2602.01011 - Pappu et al., 1 Feb 2026) in Limitations and Conclusion (final paragraph)