Predicting per‑query actual API cost for reasoning language models

Develop methods to predict the actual API cost c_m(q) for a given reasoning language model m on a specific query q before issuing the request, using the model’s listed input and output token prices and the query content, so that the prediction accounts for model-specific thinking token consumption and enables cost-aware model selection.

Background

The paper shows that listed API prices for reasoning LLMs do not reliably reflect real spending because models differ dramatically in their consumption of hidden thinking tokens, leading to frequent and sometimes extreme pricing reversals across tasks.

To enable cost-aware model selection without expensive pilot runs, the authors propose predicting the actual per‑query cost in advance. However, initial experiments with simple baselines (mean, prompt-length regression, and embedding-based KNN) perform poorly on high-variance models, and repeated runs of identical queries reveal substantial within-query variability, suggesting an irreducible noise floor that complicates accurate prediction.

References

We formalize actual cost prediction as an open problem and provide initial evidence that it is challenging due to high per-query cost variance (Section\textasciitilde\ref{sec:priceinverse:prediction}).

The Price Reversal Phenomenon: When Cheaper Reasoning Models End Up Costing More  (2603.23971 - Chen et al., 25 Mar 2026) in Introduction, Contributions list (Section 1)