- The paper presents an SMT-based approach to verify bounded equivalence in complex SQL queries with rigorous integrity constraint encoding.
- It translates SQL queries into logical SMT formulas, enabling efficient counterexample generation which outperforms existing techniques.
- Comprehensive evaluations on benchmarks like LeetCode and Calcite illustrate significant improvements in performance and coverage.
VeriEQL: Bounded Equivalence Verification for Complex SQL Queries with Integrity Constraints
The paper "VeriEQL: Bounded Equivalence Verification for Complex SQL Queries with Integrity Constraints" presents an approach to verify the equivalence of SQL queries that involve complex features and integrity constraints. This process is crucial for tasks such as query rewriting and automated grading. However, existing methods have limitations in handling complex SQL features, which limits their applicability in real-world scenarios. The paper discusses a new SMT-based approach to overcome these limitations and effectively prove or disprove the bounded equivalence of complex SQL queries. This essay summarizes the methodology, evaluation, and its potential implications.
Introduction
Equivalence checking for SQL queries is an essential task for validating query rewritings and automated grading in database applications. The ability to formally verify that two complex SQL queries produce equivalent outputs across all input scenarios up to a certain size is a crucial task for ensuring accuracy and completeness in SQL-dependent systems. The paper introduces VeriEQL, a tool that utilizes an SMT-based approach, thus enabling bounded verification using a new logical encoding model. VeriEQL is tailored to support complex SQL features such as sorting, CASE statements, and rich integrity constraints that go beyond currently available equivalence checkers. Through an extensive evaluation on over 20,000 benchmarks, VeriEQL outperforms state-of-the-art techniques by more than one order of magnitude in terms of successfully verifying bounded equivalence or refuting it by generating counterexamples.
Figure 1: Schematic workflow of VeriEQL.
Implementation Strategy
Key Components of VeriEQL
- SMT-Based Approach: VeriEQL employs a novel SMT-based logical encoding to model query semantics using the theory of integers with uninterpreted functions. This aspect enables it to handle complex SQL queries with sorting, advanced case statements, and rich integrity constraints effectively, without relying on indirect encodings such as those required by Rosette.
Algorithm
The VeriEQL algorithm translates the SQL equivalence problem into an SMT problem for bounded verification:
- Symbolic Database Construction:
- Construct a symbolic representation of the database, allowing for relations to contain up to a predefined bound of symbolic tuples (Figure 2). Each tuple status (deleted or not) is represented through uninterpreted predicates.
- Integrity Constraint Encoding:
- Translates integrity constraints into SMT formulas over symbolic tuples. This includes primary keys, foreign keys, not-null, and check constraints, and auto-increment attributes. This approach generates counterexamples that satisfy these constraints.





Figure 2: Schema, integrity constraint, and queries.
- SQL Semantics Encoding:
- Formalize SQL semantics using higher-order functions that provide a logical foundation for encoding in SMT.
- Sequence query operators are encoded for each query to capture their semantics in a symbolic manner, as demonstrated in Figures 21 and other related figures outlined in the supplementary materials.

Figure 3: The paired mechanism of EXCEPT.
- Equivalence Verification and Counterexample Generation:
- Construct an SMT formula encoding equivalence of two query outputs under the given constraints for databases of the bounded size.
- Utilize an SMT solver to prove or disprove equivalence. If the formula is satisfiable, VeriEQL generates counterexamples from the model obtained from the solver, aiding in the identification of query non-equivalence.
Evaluation
Comparison with State-of-the-Art Techniques
VeriEQL was evaluated using three different workloads: LeetCode, Calcite, and Literature, against various state-of-the-art bounded verification and testing tools. The results demonstrated VeriEQL's superior support for a larger set of complex SQL queries, and its ability to identify significantly more non-equivalent queries than existing techniques.
- LeetCode Workload: VeriEQL demonstrated a coverage of nearly 78% of the benchmarks and could successfully refute over 15%, identifying genuine counterexamples for faulty queries.
- Calcite Workload: Despite hypothetical reasons like non-equivalent queries in the benchmark, VeriEQL outperformed others, proving or disproving more examples within a 10-minute time frame.
- Literature Workload: VeriEQL supported almost all benchmarks in this category and found valid counterexamples for two orders of magnitude more benchmarks compared to current state-of-the-art techniques.
Practical and Theoretical Implications
The proposed approach provides an enhanced toolset for practitioners and researchers dealing with SQL queries. It allows robust equivalence verification that includes the consideration of integrity constraints, a feature that ensures the reduction of spurious counterexamples. As noted in evaluations against databases like MySQL and integration with practical platforms like LeetCode, VeriEQL has been instrumental in identifying bugs in real-world applications and suggesting augmented test cases. Given its substantial performance gains and extended coverage, VeriEQL promises significant contributions both to theoretical aspects of bounded verification and its practical applications, impacting areas such as database optimizations, education, and system debugging.
Conclusion
"VeriEQL: Bounded Equivalence Verification for Complex SQL Queries with Integrity Constraints" presents a practical approach that advances the formal verification and disproval of SQL query equivalence. Through its innovative SMT-based encoding and support for complex queries with rich integrity constraints, VeriEQL extensively outperforms existing techniques, making it a powerful asset for both academic exploration and applied database tasks. Its ability to generate detailed counterexamples further strengthens its position as an invaluable tool for enhancing SQL query accuracy and robustness.