- The paper introduces the MASR problem, unifying collaboration mode selection, role allocation, and LLM assignment for multi-agent systems.
- The authors employ a cascaded controller network optimized with policy gradients, achieving significant performance improvements and cost reduction.
- Experiments validate MasRouter's scalability, plug-and-play integration, and strong inductive generalization in diverse benchmarks.
MasRouter: Learning to Route LLMs for Multi-Agent Systems
Introduction and Motivation
The proliferation of LLMs has catalyzed the development of Multi-Agent Systems (MAS) that leverage the collective intelligence of heterogeneous agents for complex reasoning, code generation, and decision-making tasks. While single-agent LLM routing has been extensively studied to optimize the trade-off between performance and computational cost, existing approaches are fundamentally limited in the MAS context. They fail to address the unique challenges of multi-agent collaboration, such as dynamic selection of collaboration modes, agent role allocation, and heterogeneous LLM assignment. The "MasRouter: Learning to Route LLMs for Multi-Agent Systems" (2502.11133) paper introduces the Multi-Agent System Routing (MASR) problem and proposes MasRouter, a unified, inductive, and cost-effective solution for end-to-end MAS construction and routing.
Figure 1: Paradigm comparison between single-agent routing and multi-agent routing.
The MASR problem is formalized as a mapping from a search space S=(M,R,T)—comprising a pool of LLMs (M), agent roles (R), and collaboration modes (T)—to a MAS instance S tailored for a given query Q. The MASR objective is to maximize expected utility (task performance) while minimizing cost (e.g., token usage, API calls), parameterized by a trade-off coefficient λ:
P(S∣Q)maxE(Q,a)∼D,S∼P(S∣Q)[U(S;Q,a)−λ⋅C(S;Q)]
This formalization generalizes single-agent routing by incorporating collaboration topology and role allocation, enabling the construction of query-adaptive, heterogeneous MAS architectures.
MasRouter Architecture
MasRouter operationalizes MASR via a cascaded controller network, decomposed into three sequential modules:
- Collaboration Mode Determiner (Fθt): Utilizes a variational latent variable model to select the optimal collaboration topology (e.g., Chain, Tree, Debate) for the input query. The latent representation H is sampled from a query-conditioned Gaussian, and the collaboration mode is decoded via a learned mapping.
- Role Allocator (Fθr): Progressively assigns roles to each agent using a structured probabilistic cascade, modeling dependencies and sequential constraints among roles (e.g., programmer, tester, analyst).
- LLM Router (Fθm):
Assigns an LLM backbone to each agent by modeling the selection as a multinomial distribution, conditioned on the query, collaboration mode, and assigned roles. The compatibility between each LLM and the constructed MAS is computed via learned embeddings and fusion modules.
Figure 2: The overall framework of the proposed MasRouter, illustrating the cascaded controller for collaboration mode, role allocation, and LLM routing.
The entire pipeline is optimized end-to-end using policy gradient methods, with the reward signal reflecting both task utility and cost.
Implementation Details
- LLM Pool: Includes diverse models such as GPT-4o-mini, Claude-3.5-Haiku, Gemini-1.5-Flash, Llama-3.1-70B, and Deepseek-v3, each with distinct cost and performance profiles.
- Role Pool: Contains 26 roles spanning various domains (e.g., MathAnalyst, Algorithm Designer, Critic).
- Collaboration Modes: Chain, Tree, Complete Graph, Debate, Reflection, and others.
- Optimization: Policy gradient with cost penalty λ; temperature τ for sampling; maximum agent count γ.
- Plug-and-Play: MasRouter can be integrated into existing MAS frameworks, enabling heterogeneous LLM assignment without retraining the base system.
Experimental Results
MasRouter demonstrates superior performance and cost-effectiveness across five benchmarks (MMLU, GSM8K, MATH, HumanEval, MBPP):
- Performance: Achieves 1.8%–8.2% improvement over SOTA on MBPP; 3.51% average gain over RouterDC (best single-agent router).
- Cost: Reduces inference overhead by up to 52.07% on HumanEval and 40.22%–43.78% on MBPP compared to dynamic MAS baselines.
- Training Efficiency: Requires 69.57%–83.51% less training cost than trainable MAS pipelines (e.g., GPTSwarm, AFlow).
(Figure 3)
Figure 3: Pareto front comparison of performance and inference cost on MBPP, showing MasRouter's dominance in cost-effectiveness.
Plug-and-Play Integration
When integrated with existing MAS frameworks (e.g., MAD, MacNet), MasRouter yields consistent performance gains (0.31%–1.55%) and substantial cost reductions (17.21%–28.17%), validating its utility as a modular routing layer.
Inductive Generalization
MasRouter exhibits strong inductive capability, efficiently incorporating new LLMs (e.g., Deepseek-v3) into the routing pool without retraining, and dynamically adjusting selection frequencies based on observed performance.
Ablation and Sensitivity
Ablation studies confirm that the LLM router module is most critical for performance, while cost penalty λ enables fine-grained control over the efficiency-effectiveness trade-off. Sensitivity analysis on agent count γ and λ provides practical guidance for deployment.
Practical and Theoretical Implications
Practical Implications:
- Scalability: Enables construction of large-scale, heterogeneous MAS with dynamic, query-adaptive routing.
- Cost Control: Provides explicit mechanisms for balancing performance and resource expenditure, critical for real-world deployment.
- Modularity: Can be integrated into existing MAS frameworks as a plug-in, facilitating rapid adoption.
Theoretical Implications:
- Generalization of Routing: Extends the routing paradigm from single-agent to multi-agent, incorporating collaboration topology and role allocation as first-class citizens.
- End-to-End Differentiability: The use of variational models and multinomial approximations enables gradient-based optimization over discrete MAS structures.
Limitations and Future Directions
The current approach assumes all LLMs in the pool are trustworthy; robustness to adversarial or poisoned LLMs remains an open challenge. Future work should address adversarial robustness, dynamic LLM pool management, and further scaling to hundreds or thousands of agents.
Conclusion
MasRouter establishes a new paradigm for LLM-powered MAS by unifying collaboration mode selection, role allocation, and LLM routing in a single, end-to-end trainable framework. It achieves state-of-the-art performance and efficiency, demonstrates strong inductive generalization, and is readily deployable as a modular component in existing MAS pipelines. This work lays the foundation for scalable, economical, and adaptive collective intelligence systems, and opens new avenues for research in robust, large-scale multi-agent orchestration.