Papers
Topics
Authors
Recent
Search
2000 character limit reached

The NordDRG AI Benchmark for Large Language Models

Published 11 Jun 2025 in cs.AI | (2506.13790v2)

Abstract: LLMs are already being piloted for clinical coding and decision support. However, until now, no open benchmark has targeted the hospital funding layer where Diagnosis-Related Groups (DRG) determine reimbursement across many countries. We release NordDRG-AI-Benchmark, the first public test-bed that captures a complete DRG rule set and evaluates an LLM's ability to reason over multilingual diagnosis, procedure, and tariff logic. The benchmark bundles three classes of artefacts: (i) definition tables with 20 interlinked tables covering DRG logic, ICD and NCSP codes, age/sex splits, and country flags; (ii) expert manuals and changelog templates describing real governance workflows; and (iii) a prompt pack of 14 CaseMix tasks that span code lookup, cross-table inference, multilingual terminology, and quality-assurance audits. All artefacts are available at: https://github.com/longshoreforrest/norddrg-ai-benchmark A baseline demonstration shows that five state-of-the-art LLMs perform very differently on the nine automatically verifiable tasks: o3 (OpenAI) scores 9 out of 9, GPT-4o and o4-mini-high score 7 out of 9, while Gemini 2.5 Pro and Gemini 2.5 Flash solve only 5 out of 9 and 3 out of 9, respectively. These results confirm that NordDRG-AI-Benchmark highlights domain-specific strengths and weaknesses that remain hidden in generic LLM benchmarks, offering a reproducible baseline for research on trustworthy automation in hospital funding.

Authors (1)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.