MasRouter: Learning to Route LLMs for Multi-Agent Systems

Published 16 Feb 2025 in cs.LG and cs.MA | (2502.11133v1)

Abstract: Multi-agent systems (MAS) powered by LLMs have been demonstrated to push the boundaries of LLM capabilities, yet they often incur significant costs and face challenges in dynamic LLM selection. Current LLM routing methods effectively reduce overhead in single-agent scenarios by customizing LLM selection for each query, but they overlook the critical decisions regarding collaboration modes and agent roles in MAS. In response to this challenge, we first introduce the problem of Multi-Agent System Routing (MASR), which integrates all components of MAS into a unified routing framework. Toward this goal, we propose MasRouter, the first high-performing, cost-effective, and inductive MASR solution. MasRouter employs collaboration mode determination, role allocation, and LLM routing through a cascaded controller network, progressively constructing a MAS that balances effectiveness and efficiency. Extensive experiments demonstrate that MasRouter is (1) high-performing, achieving a $1.8\%\sim8.2\%$ improvement over the state-of-the-art method on MBPP; (2) economical, reducing overhead by up to $52.07\%$ compared to SOTA methods on HumanEval; and (3) plug-and-play, seamlessly integrating with mainstream MAS frameworks, reducing overhead by $17.21\%\sim28.17\%$ via customized routing. The code is available at https://github.com/yanweiyue/masrouter.

Abstract PDF Upgrade to Chat

Summary

The paper introduces the MASR problem, unifying collaboration mode selection, role allocation, and LLM assignment for multi-agent systems.
The authors employ a cascaded controller network optimized with policy gradients, achieving significant performance improvements and cost reduction.
Experiments validate MasRouter's scalability, plug-and-play integration, and strong inductive generalization in diverse benchmarks.

MasRouter: Learning to Route LLMs for Multi-Agent Systems

Introduction and Motivation

The proliferation of LLMs has catalyzed the development of Multi-Agent Systems (MAS) that leverage the collective intelligence of heterogeneous agents for complex reasoning, code generation, and decision-making tasks. While single-agent LLM routing has been extensively studied to optimize the trade-off between performance and computational cost, existing approaches are fundamentally limited in the MAS context. They fail to address the unique challenges of multi-agent collaboration, such as dynamic selection of collaboration modes, agent role allocation, and heterogeneous LLM assignment. The "MasRouter: Learning to Route LLMs for Multi-Agent Systems" (2502.11133) paper introduces the Multi-Agent System Routing (MASR) problem and proposes MasRouter, a unified, inductive, and cost-effective solution for end-to-end MAS construction and routing.

Figure 1: Paradigm comparison between single-agent routing and multi-agent routing.

Formalization of Multi-Agent System Routing (MASR)

The MASR problem is formalized as a mapping from a search space $\mathbb{S} = (\mathbb{M}, \mathbb{R}, \mathbb{T})$ —comprising a pool of LLMs ( $\mathbb{M}$ ), agent roles ( $\mathbb{R}$ ), and collaboration modes ( $\mathbb{T}$ )—to a MAS instance $\mathcal{S}$ tailored for a given query $\mathcal{Q}$ . The MASR objective is to maximize expected utility (task performance) while minimizing cost (e.g., token usage, API calls), parameterized by a trade-off coefficient $\lambda$ :

$\underset{\mathbb{P}(\mathcal{S} | \mathcal{Q})}{\max}\; \mathbb{E}_{(\mathcal{Q},a) \sim \mathcal{D}, \mathcal{S}\sim \mathbb{P}(\mathcal{S} | \mathcal{Q})} \left[ U(\mathcal{S}; \mathcal{Q}, a) - \lambda \cdot C(\mathcal{S}; \mathcal{Q}) \right]$

This formalization generalizes single-agent routing by incorporating collaboration topology and role allocation, enabling the construction of query-adaptive, heterogeneous MAS architectures.

MasRouter Architecture

MasRouter operationalizes MASR via a cascaded controller network, decomposed into three sequential modules:

Collaboration Mode Determiner ( $\mathbb{F}_{\theta_t}$ ): Utilizes a variational latent variable model to select the optimal collaboration topology (e.g., Chain, Tree, Debate) for the input query. The latent representation $\mathbf{H}$ is sampled from a query-conditioned Gaussian, and the collaboration mode is decoded via a learned mapping.
Role Allocator ( $\mathbb{F}_{\theta_r}$ ): Progressively assigns roles to each agent using a structured probabilistic cascade, modeling dependencies and sequential constraints among roles (e.g., programmer, tester, analyst).
LLM Router ( $\mathbb{F}_{\theta_m}$ ):

Assigns an LLM backbone to each agent by modeling the selection as a multinomial distribution, conditioned on the query, collaboration mode, and assigned roles. The compatibility between each LLM and the constructed MAS is computed via learned embeddings and fusion modules.

Figure 2: The overall framework of the proposed MasRouter, illustrating the cascaded controller for collaboration mode, role allocation, and LLM routing.

The entire pipeline is optimized end-to-end using policy gradient methods, with the reward signal reflecting both task utility and cost.

Implementation Details

LLM Pool: Includes diverse models such as GPT-4o-mini, Claude-3.5-Haiku, Gemini-1.5-Flash, Llama-3.1-70B, and Deepseek-v3, each with distinct cost and performance profiles.
Role Pool: Contains 26 roles spanning various domains (e.g., MathAnalyst, Algorithm Designer, Critic).
Collaboration Modes: Chain, Tree, Complete Graph, Debate, Reflection, and others.
Optimization: Policy gradient with cost penalty $\lambda$ ; temperature $\tau$ for sampling; maximum agent count $\gamma$ .
Plug-and-Play: MasRouter can be integrated into existing MAS frameworks, enabling heterogeneous LLM assignment without retraining the base system.

Experimental Results

Performance and Cost

MasRouter demonstrates superior performance and cost-effectiveness across five benchmarks (MMLU, GSM8K, MATH, HumanEval, MBPP):

Performance: Achieves $1.8\%$ – $8.2\%$ improvement over SOTA on MBPP; $3.51\%$ average gain over RouterDC (best single-agent router).
Cost: Reduces inference overhead by up to $52.07\%$ on HumanEval and $40.22\%$ – $43.78\%$ on MBPP compared to dynamic MAS baselines.
Training Efficiency: Requires $69.57\%$ – $83.51\%$ less training cost than trainable MAS pipelines (e.g., GPTSwarm, AFlow).

(Figure 3)

Figure 3: Pareto front comparison of performance and inference cost on MBPP, showing MasRouter's dominance in cost-effectiveness.

Plug-and-Play Integration

When integrated with existing MAS frameworks (e.g., MAD, MacNet), MasRouter yields consistent performance gains ( $0.31\%$ – $1.55\%$ ) and substantial cost reductions ( $17.21\%$ – $28.17\%$ ), validating its utility as a modular routing layer.

Inductive Generalization

MasRouter exhibits strong inductive capability, efficiently incorporating new LLMs (e.g., Deepseek-v3) into the routing pool without retraining, and dynamically adjusting selection frequencies based on observed performance.

Ablation and Sensitivity

Ablation studies confirm that the LLM router module is most critical for performance, while cost penalty $\lambda$ enables fine-grained control over the efficiency-effectiveness trade-off. Sensitivity analysis on agent count $\gamma$ and $\lambda$ provides practical guidance for deployment.

Practical and Theoretical Implications

Practical Implications:

Scalability: Enables construction of large-scale, heterogeneous MAS with dynamic, query-adaptive routing.
Cost Control: Provides explicit mechanisms for balancing performance and resource expenditure, critical for real-world deployment.
Modularity: Can be integrated into existing MAS frameworks as a plug-in, facilitating rapid adoption.

Theoretical Implications:

Generalization of Routing: Extends the routing paradigm from single-agent to multi-agent, incorporating collaboration topology and role allocation as first-class citizens.
End-to-End Differentiability: The use of variational models and multinomial approximations enables gradient-based optimization over discrete MAS structures.

Limitations and Future Directions

The current approach assumes all LLMs in the pool are trustworthy; robustness to adversarial or poisoned LLMs remains an open challenge. Future work should address adversarial robustness, dynamic LLM pool management, and further scaling to hundreds or thousands of agents.

Conclusion

MasRouter establishes a new paradigm for LLM-powered MAS by unifying collaboration mode selection, role allocation, and LLM routing in a single, end-to-end trainable framework. It achieves state-of-the-art performance and efficiency, demonstrates strong inductive generalization, and is readily deployable as a modular component in existing MAS pipelines. This work lays the foundation for scalable, economical, and adaptive collective intelligence systems, and opens new avenues for research in robust, large-scale multi-agent orchestration.