Rethinking Predictive Modeling for LLM Routing: When Simple kNN Beats Complex Learned Routers

Published 19 May 2025 in cs.LG | (2505.12601v1)

Abstract: As LLMs grow in scale and specialization, routing--selecting the best model for a given input--has become essential for efficient and effective deployment. While recent methods rely on complex learned routing strategies, their dependence on disparate training data and evaluation setups makes comparison and generalization difficult. In this work, we revisit LLM routing through the lens of simplicity. We show that a well-tuned k-Nearest Neighbors (kNN) approach not only matches but often outperforms state-of-the-art learned routers across diverse tasks. To support systematic evaluation, we introduce a suite of standardized routing benchmarks spanning instruction-following, question-answering, and reasoning tasks, as well as the first multi-modal routing dataset involving visual inputs. Our findings reveal that the locality properties of model performance in embedding space enable simple non-parametric methods to achieve strong routing decisions with lower sample complexity than parametric approaches. This challenges the prevailing trend toward sophisticated architectures and highlights the importance of thoroughly evaluating simple baselines before investing in complex solutions. To support reproducibility and further exploration, we will release all benchmarks and code upon publication.

Abstract PDF Upgrade to Chat

Summary

The paper demonstrates that simple kNN methods overcome complex learned routers for LLM routing.
It introduces standardized benchmarks across tasks such as instruction-following, QA, and vision-language to evaluate cost-quality tradeoffs.
The study reveals that kNN routers offer superior sample efficiency and practical guidance for deploying multi-model AI systems.

Rethinking Predictive Modeling for LLM Routing: When Simple kNN Beats Complex Learned Routers

Introduction

The paper "Rethinking Predictive Modeling for LLM Routing: When Simple kNN Beats Complex Learned Routers" explores the domain of LLM routing—selecting the optimal LLM for a given input. The proliferation of LLMs with varied capabilities and costs has necessitated efficient routing strategies to enhance user experience and reduce computational expenses. The paper challenges prevailing assumptions by demonstrating that simple k-Nearest Neighbors (kNN) methods can outperform state-of-the-art learned routing strategies. This is significant given the locality properties of performance in embedding spaces which favor non-parametric methods over complex architectures.

Benchmark Development

To facilitate objective comparisons, the paper introduces a suite of standardized benchmarks covering multiple task domains. These include instruction-following, question-answering, and reasoning tasks, complemented by a pioneering multi-modal benchmark for vision-LLMs.

Figure 1: Cost-Quality tradeoff for text-based routing benchmarks using model selection approaches.

The kNN Approach

The simplicity of the kNN method lies in its use of local information from the embedding space. By leveraging nearest neighbor performance data, kNN can achieve strong routing decisions with less sample complexity than parametric approaches. This challenges the notion that sophisticated, learning-based routers are needed for effective LLM routing.

Evaluation Framework

The framework for evaluation utilizes both utility prediction and model selection methods:

Utility Prediction Evaluation: Involves predicting performance scores and costs to trace the Pareto front, enabling the assessment of routers in terms of balancing performance with cost, quantified by AUC scores.
Selection-Based Evaluation: Directly maps queries to models, evaluated at distinct cost-performance preferences.

The results show that kNN-based routers perform competitively, often surpassing more complex methods.

Figure 2: Cost-Quality tradeoff for VLM routing benchmarks using model selection approaches.

Theoretical Insights

The paper provides a theoretical grounding for the observed efficacy of kNN routers. It highlights the locality of model performance in embedding spaces and establishes the sample complexity advantage of kNN over parametric models. The kNN routers exhibit superior sample efficiency, particularly in low-dimensional spaces, emphasizing the importance of embedding quality.

Practical Implications and Future Work

The paper's findings advocate for simplicity in routing strategies, offering practical guidance for the deployment of multi-model systems. Future work may explore dynamic adaptation, alternative training signals, enhanced embedding learning, and batch routing, contributing to a larger discussion on how best to manage diverse AI models effectively.

Conclusion

By revisiting the fundamentals of LLM routing, the paper underscores the potential of simple methods like kNN to deliver strong performance in model selection. This has significant implications for practitioners and researchers, suggesting a re-evaluation of when complexity is warranted, thereby potentially democratizing access to sophisticated LLM-powered systems while maintaining efficiency.

The research challenges existing paradigms by demonstrating that thorough evaluation of simple baselines can yield insights that guide the development and deployment of AI systems in practical, cost-effective ways.

Markdown Report Issue