- The paper introduces Arch-Router, a 1.5B model that realigns LLM routing with human preferences, outperforming benchmarks by 7.71%.
- It employs a novel data creation pipeline with feedback loops to generate high-quality route configurations and diverse conversation scenarios.
- The framework improves multi-turn interactions and continuous context tracking, offering flexible and transparent model selection based on user intent.
Arch-Router: Aligning LLM Routing with Human Preferences
Introduction
The paper presents "Arch-Router", a framework developed to enhance the routing of LLMs by aligning them with human preferences. The framework addresses two critical limitations in existing routing approaches: the reliance on benchmarks that do not adequately capture human preferences and the inflexibility due to the limited pool of models and retraining requirements. The proposed solution is a preference-aligned routing mechanism that uses a compact 1.5B model, Arch-Router, to guide the selection of models by aligning user queries with domain-action preferences.
Figure 1: Preference-Aligned Routing Mechanism. The routes policies and user conversation is provided to the router to select the appropriate policy and corresponding LLM. Example of usage in a coding application is shown in the right.
Methodology
Arch-Router Framework
The Arch-Router framework introduces a Domain-Action Taxonomy, allowing users to define routing policies expressed in natural language which correspond to specific LLM models. These policies guide routing decisions based on subjective human preferences. Arch-Router, a compact 1.5B LLM, efficiently maps user queries to these preferences, enabling seamless incorporation of new models without retraining.
Data Creation
A novel data creation pipeline is employed to support the training of Arch-Router. The process is divided into two phases: generating route configurations with feedback loops and augmenting the conversations with diverse scenarios and irrelevance. This structured approach ensures the generation and utilization of high-quality conversational data, leading to improved model training outcomes.
Figure 2: Overview of the data creation for Arch-Router framework. Phase 1 generates route configurations through an LLM process with feedback loops. Phase 2 generates conversations from generated intent. Phase 3 augments the conversations to get diverse scenarios and irrelevance.
Experiments and Results
Arch-Router outperforms state-of-the-art proprietary LLMs by 7.71% on average in matching queries to human preferences across various datasets. The experiments demonstrate significant improvements in multi-turn interactions, showcasing Arch-Router's ability to track continuous contexts and adapt to user intent with remarkable accuracy. Performance metrics such as Turn, Span, and Conversation-level accuracy consistently show Arch-Router leading, particularly in maintaining high span-level and conversation-level accuracies.
Analysis
Error pattern analysis reveals Arch-Router's higher susceptibility to initial query ambiguity, yet it maintains robustness through multi-turn interactions once the user's intent is correctly identified. Comparatively, proprietary models like Claude-Sonnet-3.7 exhibit error distributions throughout the span, highlighting Arch-Router's advantage in continuous intent tracking.

Figure 3: Comparison of failure distributions for Arch-Router (left) and Claude-Sonnet-3.7 (right) on SGD dataset.
The preference-aligned routing strategy prioritizes human-defined criteria over automated benchmarks, offering distinct benefits in subjective settings where user preferences dictate the success of LLM interactions. This contrasts with performance-based routing, which optimizes model selections based on predicted performance scores. Preference-aligned routing, therefore, provides more transparency and control, aligning closely with human expectations.
Limitations
Despite its advantages, the framework's dependency on the clarity of policy descriptions and user-guided model selection poses limitations. Ambiguous policy descriptions can degrade performance, and improper model assignments can result in suboptimal routing decisions even with accurate query-routing alignment.
Conclusion
Arch-Router's preference-aligned framework redefines LLM routing by centering decisions around human experiences and preferences. The methodology introduces significant flexibility and adaptability, potentially setting a new standard for LLM deployment strategies. Future research could explore hybrid frameworks integrating preference-aligned and performance-based approaches to widen applicability further. The Arch-Router model and its training data pipeline demonstrate the viability of embedding human-centric objectives in LLM routing frameworks, paving the way for more nuanced and effective AI implementations.