Cost estimates for non-OpenAI models in GDPval speed/cost analysis

Determine the model completion costs for Claude Opus 4.1, Gemini 2.5 Pro, and Grok 4 on the GDPval gold subset tasks, using the same evaluation setup, so that these systems can be included in the paper’s speed and cost savings analyses alongside the reported OpenAI models.

Background

The paper analyzes potential speed and cost savings when using frontier AI models on GDPval tasks, comparing model-assisted workflows to unaided expert professionals. For OpenAI models, the authors compute model completion time and cost from API metadata and invoiced cost per task.

However, comparable cost estimates for other leading models were not available, preventing their inclusion in the speed and cost analysis. Establishing these costs would enable direct cross-model comparison of efficiency and economics within the GDPval framework.

References

We were not able to obtain cost estimates for Claude, Gemini, and Grok.

— GDPval: Evaluating AI Model Performance on Real-World Economically Valuable Tasks (2510.04374 - Patwardhan et al., 5 Oct 2025) in Footnote in Subsection 3.2 “Speed and cost comparison,” Section 3 (Experiments and Results)

Cost estimates for non-OpenAI models in GDPval speed/cost analysis

Background

References

Related Problems