Arch-Router: Aligning LLM Routing with Human Preferences

Published 19 Jun 2025 in cs.CL | (2506.16655v1)

Abstract: With the rapid proliferation of LLMs -- each optimized for different strengths, style, or latency/cost profile -- routing has become an essential technique to operationalize the use of different models. However, existing LLM routing approaches are limited in two key ways: they evaluate performance using benchmarks that often fail to capture human preferences driven by subjective evaluation criteria, and they typically select from a limited pool of models. In this work, we propose a preference-aligned routing framework that guides model selection by matching queries to user-defined domains (e.g., travel) or action types (e.g., image editing) -- offering a practical mechanism to encode preferences in routing decisions. Specifically, we introduce \textbf{Arch-Router}, a compact 1.5B model that learns to map queries to domain-action preferences for model routing decisions. Our approach also supports seamlessly adding new models for routing without requiring retraining or architectural modifications. Experiments on conversational datasets demonstrate that our approach achieves state-of-the-art (SOTA) results in matching queries with human preferences, outperforming top proprietary models. Our approach captures subjective evaluation criteria and makes routing decisions more transparent and flexible. Our model is available at: \texttt{https://huggingface.co/katanemo/Arch-Router-1.5B}.

Abstract PDF Upgrade to Chat

Summary

The paper introduces Arch-Router, a 1.5B model that realigns LLM routing with human preferences, outperforming benchmarks by 7.71%.
It employs a novel data creation pipeline with feedback loops to generate high-quality route configurations and diverse conversation scenarios.
The framework improves multi-turn interactions and continuous context tracking, offering flexible and transparent model selection based on user intent.

Arch-Router: Aligning LLM Routing with Human Preferences

Introduction

The paper presents "Arch-Router", a framework developed to enhance the routing of LLMs by aligning them with human preferences. The framework addresses two critical limitations in existing routing approaches: the reliance on benchmarks that do not adequately capture human preferences and the inflexibility due to the limited pool of models and retraining requirements. The proposed solution is a preference-aligned routing mechanism that uses a compact 1.5B model, Arch-Router, to guide the selection of models by aligning user queries with domain-action preferences.

Figure 1: Preference-Aligned Routing Mechanism. The routes policies and user conversation is provided to the router to select the appropriate policy and corresponding LLM. Example of usage in a coding application is shown in the right.

Methodology

Arch-Router Framework

The Arch-Router framework introduces a Domain-Action Taxonomy, allowing users to define routing policies expressed in natural language which correspond to specific LLM models. These policies guide routing decisions based on subjective human preferences. Arch-Router, a compact 1.5B LLM, efficiently maps user queries to these preferences, enabling seamless incorporation of new models without retraining.

Data Creation

A novel data creation pipeline is employed to support the training of Arch-Router. The process is divided into two phases: generating route configurations with feedback loops and augmenting the conversations with diverse scenarios and irrelevance. This structured approach ensures the generation and utilization of high-quality conversational data, leading to improved model training outcomes.

Figure 2: Overview of the data creation for Arch-Router framework. Phase 1 generates route configurations through an LLM process with feedback loops. Phase 2 generates conversations from generated intent. Phase 3 augments the conversations to get diverse scenarios and irrelevance.

Experiments and Results

Arch-Router outperforms state-of-the-art proprietary LLMs by 7.71% on average in matching queries to human preferences across various datasets. The experiments demonstrate significant improvements in multi-turn interactions, showcasing Arch-Router's ability to track continuous contexts and adapt to user intent with remarkable accuracy. Performance metrics such as Turn, Span, and Conversation-level accuracy consistently show Arch-Router leading, particularly in maintaining high span-level and conversation-level accuracies.

Analysis

Error pattern analysis reveals Arch-Router's higher susceptibility to initial query ambiguity, yet it maintains robustness through multi-turn interactions once the user's intent is correctly identified. Comparatively, proprietary models like Claude-Sonnet-3.7 exhibit error distributions throughout the span, highlighting Arch-Router's advantage in continuous intent tracking.

Figure 3: Comparison of failure distributions for Arch-Router (left) and Claude-Sonnet-3.7 (right) on SGD dataset.

Preference-Aligned vs Performance-Based Routing

The preference-aligned routing strategy prioritizes human-defined criteria over automated benchmarks, offering distinct benefits in subjective settings where user preferences dictate the success of LLM interactions. This contrasts with performance-based routing, which optimizes model selections based on predicted performance scores. Preference-aligned routing, therefore, provides more transparency and control, aligning closely with human expectations.

Limitations

Despite its advantages, the framework's dependency on the clarity of policy descriptions and user-guided model selection poses limitations. Ambiguous policy descriptions can degrade performance, and improper model assignments can result in suboptimal routing decisions even with accurate query-routing alignment.

Conclusion

Arch-Router's preference-aligned framework redefines LLM routing by centering decisions around human experiences and preferences. The methodology introduces significant flexibility and adaptability, potentially setting a new standard for LLM deployment strategies. Future research could explore hybrid frameworks integrating preference-aligned and performance-based approaches to widen applicability further. The Arch-Router model and its training data pipeline demonstrate the viability of embedding human-centric objectives in LLM routing frameworks, paving the way for more nuanced and effective AI implementations.