Papers
Topics
Authors
Recent
Search
2000 character limit reached

DB-Explore: Automated Database Exploration and Instruction Synthesis for Text-to-SQL

Published 6 Mar 2025 in cs.CL | (2503.04959v2)

Abstract: Recent text-to-SQL systems powered by LLMs have demonstrated remarkable performance in translating natural language queries into SQL. However, these systems often struggle with complex database structures and domain-specific queries, as they primarily focus on enhancing logical reasoning and SQL syntax while overlooking the critical need for comprehensive database understanding. To address this limitation, we propose DB-Explore, a novel framework that systematically aligns LLMs with database knowledge through automated exploration and instruction synthesis. DB-Explore constructs database graphs to capture complex relational schemas, leverages GPT-4 to systematically mine structural patterns and semantic knowledge, and synthesizes instructions to distill this knowledge for efficient fine-tuning of LLMs. Our framework enables comprehensive database understanding through diverse sampling strategies and automated instruction generation, bridging the gap between database structures and LLMs. Experiments conducted on the SPIDER and BIRD benchmarks validate the effectiveness of DB-Explore, achieving an execution accuracy of 67.0% on BIRD and 87.8% on SPIDER. Notably, our open-source implementation based on Qwen2.5-Coder-7B achieves state-of-the-art results at minimal computational cost, outperforming several GPT-4-driven Text-to-SQL systems.

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.