Deep Research is the New Analytics System: Towards Building the Runtime for AI-Driven Analytics

Published 2 Sep 2025 in cs.AI, cs.DB, cs.LG, and cs.MA | (2509.02751v1)

Abstract: With advances in LLMs, researchers are creating new systems that can perform AI-driven analytics over large unstructured datasets. Recent work has explored executing such analytics queries using semantic operators -- a declarative set of AI-powered data transformations with natural language specifications. However, even when optimized, these operators can be expensive to execute on millions of records and their iterator execution semantics make them ill-suited for interactive data analytics tasks. In another line of work, Deep Research systems have demonstrated an ability to answer natural language question(s) over large datasets. These systems use one or more LLM agent(s) to plan their execution, process the dataset(s), and iteratively refine their answer. However, these systems do not explicitly optimize their query plans which can lead to poor plan execution. In order for AI-driven analytics to excel, we need a runtime which combines the optimized execution of semantic operators with the flexibility and more dynamic execution of Deep Research systems. As a first step towards this vision, we build a prototype which enables Deep Research agents to write and execute optimized semantic operator programs. We evaluate our prototype and demonstrate that it can outperform a handcrafted semantic operator program and open Deep Research systems on two basic queries. Compared to a standard open Deep Research agent, our prototype achieves up to 1.95x better F1-score. Furthermore, even if we give the agent access to semantic operators as tools, our prototype still achieves cost and runtime savings of up to 76.8% and 72.7% thanks to its optimized execution.

Abstract PDF Upgrade to Chat

Summary

The paper presents a novel runtime that fuses semantic operators with deep research systems to optimize unstructured data analytics.
It demonstrates up to a 1.95x improvement in F1-scores along with 76.8% cost and 72.7% runtime savings compared to traditional methods.
The integrated approach employs logical and physical optimizations, such as cached context reuse, to dynamically enhance query execution efficiency.

Deep Research as an AI-Driven Analytics System

Introduction to AI-Driven Analytics

The integration of AI in analytics systems aims to provide efficient analysis of vast unstructured datasets through AI-driven approaches. Traditional systems, such as OLAP databases, have historically excelled in handling structured data but lack robustness when addressing unstructured workloads. The paper "Deep Research is the New Analytics System: Towards Building the Runtime for AI-Driven Analytics" proposes a novel system that leverages the advantages of semantic operators and Deep Research systems.

Semantic operators represent AI-driven transformations specified in natural language, providing a declarative framework for task optimization. Meanwhile, Deep Research systems utilize LLMs to execute and refine query plans, bridging the gap between dynamic execution and optimized semantic processing.

The main objective of this research is to create a runtime system that effectively combines the efficiency of semantic operators with the dynamic capabilities of Deep Research systems for unstructured analytics tasks.

Semantic Operators and Deep Research Integration

Semantic operator systems have proven effective in optimizations for tasks such as information extraction, ranking, and summarization across large unstructured datasets. They utilize relational-like operators in an AI-driven context. Despite their strengths, they are currently limited by their iterator execution semantics, which often results in inefficiencies for interactive analytics tasks over extensive datasets.

Conversely, Deep Research systems dynamically adapt to user queries using tools, iterative planning, and execution with LLMs, yet they often suffer from suboptimal execution plans due to their flexibility and lack of formal optimization steps.

The proposed system integrates Deep Research computational strategies with optimized execution plans generated through semantic operators, offering a comprehensive solution to dynamic and efficient unstructured data processing.

Implementation and Evaluation

The prototype implementation extends the Palimpzest framework, introducing compute and search operators supported by LLM agents. These operators process inputs in a dynamic fashion while offering optimized execution facilitated by semantic operators.

Figure 1: An example implementation showcasing queries on unstructured datasets where semantic operators struggle due to high execution cost, remedied by the dynamic and optimized approach of the prototype system.

Experiments demonstrate the prototype's efficiency over traditional approaches by significantly reducing error rates and execution times. The system achieves up to 1.95x better F1-scores compared to standard open Deep Research systems, while demonstrating resource savings of up to 76.8% in cost and 72.7% in runtime.

Optimizations for Efficient Execution

Logical and physical optimizations are key to the proposed system's framework. Logical optimizations involve rewriting queries to improve scope specification and potentially merge similar operations to reduce redundant computation. Physical optimizations, conversely, focus on reusing cached contexts to enhance query efficiency, implementing materialized Contexts with retrieval capabilities that match new instructions closely.

Figure 2: Overview of the Palimpzest program illustrating Context object creation and operator execution, reflecting the system's optimized approach to analytics.

Future Directions

The research suggests several avenues for future improvement, including enhancements to query optimization techniques and further integration of LLMs for adaptive runtime. Materialized Contexts hold promise for long-term efficiency and responding to dynamic environments through indexed caching.

Ongoing advancements in AI-driven analytics demand further exploration into unifying structured and unstructured data processing capabilities by drawing on the strengths inherent in existing systems while innovatively addressing current limitations.

Conclusion

This study offers a foundational approach to AI-driven analytics, merging the tailored execution capabilities of Deep Research systems with the optimized pathways of semantic operators. While traditional techniques face challenges with unstructured data, this integrated system presents a robust alternative with proven efficacy, offering considerable advancements in both flexibility and efficiency.

In sum, the contribution of "Deep Research is the New Analytics System" lies in its innovative proposal for overcoming existing challenges in AI-driven data analytics, fostering new possibilities for efficient and automated unstructured data processing.