LAFA: Agentic LLM-Driven Federated Analytics over Decentralized Data Sources

Published 21 Oct 2025 in cs.AI, cs.CR, cs.DC, and cs.MA | (2510.18477v2)

Abstract: LLMs have shown great promise in automating data analytics tasks by interpreting natural language queries and generating multi-operation execution plans. However, existing LLM-agent-based analytics frameworks operate under the assumption of centralized data access, offering little to no privacy protection. In contrast, federated analytics (FA) enables privacy-preserving computation across distributed data sources, but lacks support for natural language input and requires structured, machine-readable queries. In this work, we present LAFA, the first system that integrates LLM-agent-based data analytics with FA. LAFA introduces a hierarchical multi-agent architecture that accepts natural language queries and transforms them into optimized, executable FA workflows. A coarse-grained planner first decomposes complex queries into sub-queries, while a fine-grained planner maps each subquery into a Directed Acyclic Graph of FA operations using prior structural knowledge. To improve execution efficiency, an optimizer agent rewrites and merges multiple DAGs, eliminating redundant operations and minimizing computational and communicational overhead. Our experiments demonstrate that LAFA consistently outperforms baseline prompting strategies by achieving higher execution plan success rates and reducing resource-intensive FA operations by a substantial margin. This work establishes a practical foundation for privacy-preserving, LLM-driven analytics that supports natural language input in the FA setting.

Abstract PDF Upgrade to Chat

Summary

The paper introduces LAFA, which integrates LLM-driven natural language processing with federated analytics to enable privacy-preserving data queries over decentralized sources.
The methodology employs a hierarchical multi-agent system that decomposes complex queries into optimized DAG workflows, significantly reducing operation redundancy.
Results demonstrate near-perfect completion ratios and reduced operational overhead compared to baseline prompting techniques, ensuring efficient and scalable analytics.

LAFA: Agentic LLM-Driven Federated Analytics over Decentralized Data Sources

The paper introduces LAFA, a system that combines LLMs with federated analytics (FA) to enable privacy-preserving data analytics over decentralized data sources. By integrating LLM-driven natural language interfacing with FA, LAFA aims to resolve complex queries while maintaining stringent privacy protections, a task that neither traditional LLM systems nor existing FA frameworks can adequately accomplish on their own.

System Overview

LAFA features a hierarchical multi-agent system designed to accept natural language queries and transform them into optimized FA workflows. The system comprises several key components:

Queriers: Users who submit natural language analytics queries.
Clients with Devices: Distributed devices where data is stored and processed locally to preserve privacy.
Server: Hosts the LLM-driven agents and acts as a central coordinator for aggregating results without accessing raw data.

The hierarchical agent framework works in three phases:

Hierarchical Decomposition: The coarse-grained planner segments complex queries into single-intent sub-queries, while the fine-grained planner maps each sub-query into a Directed Acyclic Graph (DAG) of FA operations.
DAG Optimization: The optimizer agent rewrites and merges multiple DAGs to minimize redundant operations.
FA Execution: The optimized DAG is executed across distributed clients, preserving data privacy and efficient computation.
Figure 1: The system overview of LAFA.

Workflow and Execution

The execution of LAFA starts with the submission of queries in natural language, which are then processed by the agents to construct executable FA workflows. The agents ensure logical sequence adherence and eliminate redundancy, significantly reducing operational overhead without sacrificing analytical accuracy.

Figure 2: The workflow of LAFA using a query as an example.

Key Challenges Addressed

LAFA addresses two principal challenges in LLM and FA integration:

Logical Sequencing Deficiency: Existing LLM agents often generate operation sequences that violate FA procedural semantics, leading to computational errors or privacy risks. LAFA’s hierarchical agent design ensures adherence to FA workflow semantics.
Multi-Sub-Query Decomposition: Complex queries frequently contain multiple analytical intents. LLM agents can generate redundant FA operations for each intent, resulting in inefficiency. LAFA’s modular design minimizes redundancy by leveraging DAG optimization.
Figure 3: The impact of DAG optimizer in reducing operation count.

Evaluation and Results

LAFA demonstrates significant improvements over baseline prompting techniques in both completion ratio and operational efficiency:

Completion Ratio: LAFA achieves near-perfect completion ratios across diverse query sets, significantly outperforming zero-shot and one-shot prompting strategies.
Operation Count: Redundant operations are markedly reduced in LAFA, showcasing better resource utilization and efficiency. The optimized execution plan reduces computational and communication overheads.

These outcomes establish LAFA as a superior framework capable of answering complex federated analytics queries while ensuring privacy compliance.

Conclusion

LAFA represents a pivotal advancement in LLM-driven federated analytics, blending natural language processing capabilities with privacy-preserving data computation. This integration provides a scalable solution for executing complex analytics across decentralized data sources without compromising data privacy. Future developments in LAFA might include expanded support for diverse data types and enhanced optimization algorithms, further broadening the system's applicability and efficiency in federated environments.

Markdown Report Issue