DRAMA: Unifying Data Retrieval and Analysis for Open-Domain Analytic Queries

Published 31 Oct 2025 in cs.DB, cs.AI, cs.CL, and cs.IR | (2510.27238v1)

Abstract: Manually conducting real-world data analyses is labor-intensive and inefficient. Despite numerous attempts to automate data science workflows, none of the existing paradigms or systems fully demonstrate all three key capabilities required to support them effectively: (1) open-domain data collection, (2) structured data transformation, and (3) analytic reasoning. To overcome these limitations, we propose DRAMA, an end-to-end paradigm that answers users' analytic queries in natural language on large-scale open-domain data. DRAMA unifies data collection, transformation, and analysis as a single pipeline. To quantitatively evaluate system performance on tasks representative of DRAMA, we construct a benchmark, DRAMA-Bench, consisting of two categories of tasks: claim verification and question answering, each comprising 100 instances. These tasks are derived from real-world applications that have gained significant public attention and require the retrieval and analysis of open-domain data. We develop DRAMA-Bot, a multi-agent system designed following DRAMA. It comprises a data retriever that collects and transforms data by coordinating the execution of sub-agents, and a data analyzer that performs structured reasoning over the retrieved data. We evaluate DRAMA-Bot on DRAMA-Bench together with five state-of-the-art baseline agents. DRAMA-Bot achieves 86.5% task accuracy at a cost of $0.05, outperforming all baselines with up to 6.9 times the accuracy and less than 1/6 of the cost. DRAMA is publicly available at https://github.com/uiuc-kang-lab/drama.

Abstract PDF Upgrade to Chat

Summary

The paper introduces DRAMA, an end-to-end paradigm that unifies data collection, transformation, and analytic reasoning into one seamless pipeline.
It presents DramaBot, a multi-agent system that achieves 86.5% accuracy on DramaBench with an API cost of only $0.05 per task.
The study demonstrates enhanced claim verification and question answering, highlighting effective integration of diverse data sources.

DRAMA: Unifying Data Retrieval and Analysis for Open-Domain Analytic Queries

The paper "DRAMA: Unifying Data Retrieval and Analysis for Open-Domain Analytic Queries" presents a new paradigm, Drama, that aims to automate the data science workflow by integrating data collection, transformation, and analysis into a unified pipeline. The paper introduces DramaBot, a multi-agent system built on Drama, and evaluates its performance using a new benchmark, DramaBench, which consists of various real-world analytic tasks requiring open-domain data retrieval and structured reasoning.

Drama Paradigm

Drama consists of three main stages, each designed to address specific limitations in current data science methodologies:

Data Collection: The collect function takes a user query in natural language and searches for relevant raw data from open domains. This process is significantly more complex than the capabilities of existing web search tools, which are often limited to simple text-based lookups. Drama overcomes this by enabling large-scale data retrieval across diverse formats.
Data Transformation: Retrieved data, which can be in multiple formats like PDFs or Excel files, is transformed into structured data. By adopting a single-table representation, Drama addresses the challenges of dealing with heterogeneity in data sources, a step that existing systems often overlook.
Data Analysis: The analyze function abstracts answering a query over a structured table as semantic query parsing, adaptable to various programming languages or frameworks. This stage enhances analytic reasoning over transformed data, a capability absent in text generation-focused systems.
Figure 1: Overview of the Drama paradigm. Here we present two examples: (left) user query as a question ( $Q_1$ ), and (right) user query as a claim to be verified ( $Q_2$ ).

DramaBench: Testing the Paradigm

DramaBench is introduced as a novel benchmark for evaluating the Drama paradigm. It includes two categories of tasks: claim verification and question answering, each involving real-world data collection and analysis:

Claim Verification: Tasks involve retrieving and analyzing data to determine the truthfulness of claims, enhancing the evaluation of fact-checking capabilities beyond simple information extraction.
Question Answering: These tasks require precise answers derived from structured data, demanding more complex reasoning than typical text-based QA systems.
Figure 2: Overview of each DramaBench task. Given a user query, the agent is tasked with collecting, structuring, and analyzing data from open domains to generate an answer.

DramaBot: Implementation Details

DramaBot operationalizes Drama by coordinating the efforts of various sub-agents:

Web Browser: A sophisticated browsing agent capable of retrieving data by interacting with real-world websites. This component performs actions beyond standard web scraping tactics, including direct file downloads and careful navigation through web content.
Data Transformer: The data transformation process is guided by a dynamic table aggregation function capable of handling diverse data formats and linking them in a structured manner tailored to the user query.
Web Augmenter: Complements the web browser with large-scale data collection capabilities, using tools like the OpenAI search tool to fill gaps left by traditional browsing.
Figure 3: Overview of DramaBot.

Evaluation and Results

DramaBot achieves notable success on DramaBench, achieving 86.5% accuracy at a cost of \$0.05 per task, outperforming all tested baseline agents in both accuracy and API cost efficiency. This showcases DramaBot's effectiveness in dynamically assimilating large datasets and performing precise analytic reasoning:

Accuracy: DramaBot consistently grounds its outputs in data retrieval and structured analysis, demonstrating strong analytical capabilities even in complex question answering tasks.
Data and Code Quality: DramaBot's robust performance in generating data and executing code underscores its comprehensive approach to data science.
Figure 4: The overall accuracy (\%) of different agents across time. Each labeled time point marks the start of a three-month period.

Conclusion

The introduction of Drama as an end-to-end paradigm in data science addresses significant deficiencies in existing methodologies, primarily around integrating open-domain data retrieval with structured analytic reasoning. DramaBot's successful deployment on DramaBench illustrates the paradigm's effectiveness, highlighting its potential for real-world applications where dynamic data collection and complex analysis are essential. The paper's contributions pave the way for future developments in AI systems capable of performing comprehensive data-driven queries autonomously.