TiSQL: Temporal, LLM, & Synthetic SQL Insights
- TiSQL is a multi-faceted concept that includes native temporal SQL extensions with robust support for temporal and bitemporal data, enabling succinct historical queries.
- TiSQL also denotes an LLM-driven text-to-SQL component that converts clarified natural language into precise SQL using schema filtering and iterative self-refinement.
- TiSQL encompasses synthetic datasets like TinySQL designed for mechanistic interpretability, offering controlled, graded challenges to benchmark SQL-generation models.
TiSQL refers to three distinct, technically unrelated systems and frameworks that share the same acronym but address different aspects of SQL technology: (1) TiSQL as the temporal SQL3 extension for native temporal and bitemporal database support (Mkaouar et al., 2011), (2) TiSQL as the text-to-SQL prompt-engineering component of the TiInsight LLM-powered EDA platform (Zhu et al., 14 Jan 2026), and (3) instances of TiSQL as synthetic datasets for interpretability research, such as “TinySQL” (Harrasse et al., 17 Mar 2025). The unifying element among these systems is the extension or facilitation of SQL for richer semantics, automation, or research, but each operates at a different layer of database and data science pipelines.
1. Native Temporal SQL Extensions: TiSQL in Temporal Databases
TiSQL, as introduced by Mkaouar, Bouaziz, and Moalla, denotes a native extension to SQL3 designed for seamless support of temporal and bitemporal data (Mkaouar et al., 2011). The TiSQL model augments the object-relational data model by allowing any table or attribute to carry Valid-Time (VT), Transaction-Time (TT), or both (bitemporal) stamps. These are implemented as lists of time intervals per attribute value, e.g.,
TiSQL extends all classical relational operators (selection, projection, join, set, cartesian product) to their temporal analogues via dedicated keywords and syntax. Seven core temporal qualifiers (HISTORY, PAST, FUTURE, @, BETWEEN, WHEN, SINCE/BEFORE/AFTER) are introduced, which can be applied to query, join, or set operators. Additional bitemporal qualifiers (RETROACTIF, POSTACTIF, ERRONEOUS) refine transaction-time updates.
A table of key temporal operators and their semantics:
| TiSQL Operator | Formal Definition/Effect | Supported Clause |
|---|---|---|
| HISTORY | Filters tuples across entire VT | SELECT, FROM, JOIN |
| PAST/FUTURE | Restricts VT to or | SELECT/WHERE |
| @ d (time-slice) | Filters for validity at date | SELECT, FROM |
| BETWEEN d1 AND d2 | Intersects intervals with | SELECT, FROM |
| RETROACTIF/POSTACTIF | Restricts TT update domain | FROM/JOIN |
| TAG ON/CORRECT | Insert non-destructive/rectificatory updates | UPDATE |
These TiSQL constructs enable queries such as:
1 2 3 |
SELECT HISTORY Status FROM TEACHER WHERE TeacherNum = 123; |
1 2 3 |
SELECT DeptCode, MAX(HISTORY Budget DECADE) FROM DEPARTMENT GROUP BY DeptCode; |
Temporal updates (e.g., TAG ON, CORRECT) allow insertion and correction of historical facts in a non-destructive, transaction-time consistent manner, obviating the need for ad hoc period tables or procedural triggers.
Performance considerations are grounded in interval-algebra, recommending early temporal restriction pushdown, indexing strategies (B+-tree on temporal keys, interval-trees), and horizontal partitioning by time slices. No explicit algorithmic complexity bounds are provided, but empirical results suggest parity with well-optimized period-table systems, with native TiSQL syntax yielding more succinct query formulations (Mkaouar et al., 2011).
2. LLM-Based Text-to-SQL with TiSQL in TiInsight
In the TiInsight end-to-end Automated Exploratory Data Analysis (EDA) pipeline, TiSQL is the text-to-SQL transformation component that converts clarified and decomposed natural language sub-questions into executable SQL (Zhu et al., 14 Jan 2026). TiSQL’s design leverages off-the-shelf proprietary LLMs (e.g., GPT-4), with no custom fine-tuning or architecture changes.
The TiInsight TiSQL component operates as follows:
- Inputs: Receives a clarified or decomposed sub-question and the Hierarchical Data Context (HDC), which encapsulates database, table, column, and relationship summaries.
- Two-stage Schema Filtering:
- Coarse map: Vector-index lookups (e.g., cosine similarity between and table summaries) select top- candidate tables.
- Fine map: Partitioned table groups are sent in parallel to the LLM with CoT prompts to select precise tables and columns.
- Reduce: Merges decisions to obtain the minimal schema subset required for SQL generation.
- Prompt-based SQL Generation: Constructs an LLM prompt containing chosen HDC fragments and ; uses chain-of-thought pattern to induce explicit generation of joins, filters, aggregates, and so forth.
- Self-refinement Loop:
- EXPLAIN-refine: SQL is wrapped in an EXPLAIN statement; LLM consumes DBMS feedback to correct errors.
- EXECUTE-refine: Runtime errors trigger another round of LLM corrections using error message context.
- Output: Returns valid, semantically correct SQL for downstream visualization (via TiChart) or tabular presentation.
Unlike neural text-to-SQL models with custom training, TiSQL as instantiated here is a framework atop LLM prompt engineering and iterative refinement. HDC injection is handled via prompt preamble; schema overflow is managed by the aforementioned filtering process to respect LLM context window limits.
No unique supervised loss, reward signal, or RL objective is introduced; TiSQL’s correctness arises entirely from the pre-trained LLM’s autoregressive language modeling performance. TiSQL’s operators state that it “significantly reduces schema-linking complexity” and qualitatively “achieves higher correctness in practice” than generic zero-shot prompting, but quantifiable metrics are deferred to future publications (Zhu et al., 14 Jan 2026).
Illustrative end-to-end transformation:
- Input: "Identify the impact of Federal Reserve interest rate hikes."
- Clarified into sub-questions via LLM agent and decomposed: retrieve time series for federal funds rate and CPI.
- Each sub-question is mapped to relevant tables/columns, LLM generates a valid SQL, which is refined with EXPLAIN/EXECUTE feedback, and passed to visualization modules.
Limitations include potential prompt window overflow for extremely wide schemas, LLM hallucination of non-existent schema elements, and increased latency and API costs due to multiple LLM calls per user question. Proposed future extensions include lightweight LLM fine-tuning, learned rerankers for SQL refinement, and a multi-agent decomposition framework (Zhu et al., 14 Jan 2026).
3. Synthetic Text-to-SQL Datasets: TinySQL for Mechanistic Interpretability
TinySQL is a synthetic text-to-SQL dataset designed for mechanistic interpretability research, enabling controlled probing of LLMs’ internal circuits that underlie SQL generation (Harrasse et al., 17 Mar 2025). The dataset includes:
- Construction: 300,000 examples (100,000 per subset), spanning single-table queries with 2–12 column schemas and escalating in SQL/natural language complexity.
- Subsets: Three base (CS1, CS2, CS3) variants by SQL complexity, corresponding synonym-perturbed (CS1_Syn, etc.), and a free-form natural language subset (CS1_Nat), with full schema and query definitions per instance.
- SQL Grammar: Each tier’s structure is specified by explicit context-free grammars, e.g.
- Benchmarking: Small models (33M–1B params) trained on TinySQL achieve from 85% to over exact-match SQL accuracy on test tasks, providing a tunable challenge for interpretability research.
TinySQL is utilized for structured mechanistic analyses using methods such as activation patching, edge attribution patching, and sparse autoencoders. These techniques expose how transformer heads or MLPs are causally involved in encoding SQL subskills: SELECT–FROM identification, ORDER BY clause synthesis, etc. For instance, in small models, a minimal circuit of 10–15 heads suffices to preserve accuracy on basic SQL (Harrasse et al., 17 Mar 2025).
A significant insight is that larger models diffuse circuit responsibility across MLPs and layers, while subskill identifiability is clearer in smaller models. Interpretability analyses on TinySQL are feeding back into dataset design, eliminating superficial cues that models might exploit.
4. Comparison and Positioning of TiSQL Frameworks
Although sharing the TiSQL moniker, the temporal SQL extension, the TiInsight text-to-SQL module, and TinySQL for interpretability operate at different logical layers.
| Framework | Primary Function | Methodology | Key Features |
|---|---|---|---|
| Temporal TiSQL (Mkaouar et al., 2011) | Native temporal data & queries | SQL3 extension, declarative | VT/TT stamps, temporal ops, bitemporality |
| TiInsight TiSQL (Zhu et al., 14 Jan 2026) | Natural language to SQL | LLM prompting, schema filtering | HDC, CoT prompts, self-refinement |
| TinySQL (Harrasse et al., 17 Mar 2025) | Mechanistic interpretability of SQL-generating models | Synthetic data, ablation, attribution | Progressive grammatical complexity |
The temporal TiSQL extension is closest to SQL/DBMS semantics, directly impacting how databases store and query temporal facts. TiInsight’s TiSQL is a meta-layer, acting as an intelligent translation interface for end-users, leveraging LLM capabilities without modifying underlying SQL. TinySQL is not an interface or extension, but a research artifact—neither an LLM nor a query engine, but an experimental scaffold to study and reverse-engineer model architectures in the context of SQL.
5. Limitations and Future Directions
For native temporal TiSQL (Mkaouar et al., 2011), practical limitations stem from schema bloat, index maintenance, and the increased algorithmic complexity of interval operations. Temporal pushdown and optimized interval indexing are recommended.
For TiInsight TiSQL (Zhu et al., 14 Jan 2026), the main technical risks are prompt length limits (mitigated via hierarchical data context filtering), hallucination (especially when HDC summaries are imperfect), and the latency imposed by serial LLM refinement. Scalability is manageable but not eliminated; the paper suggests future research in lightweight LLM fine-tuning and multi-agent decomposition approaches.
TinySQL’s limitation is its current focus on single-table queries, which, while ideal for fine-grained mechanistic interpretability, lacks the complexity of realistic multi-table, join-heavy database environments. Extension to richer query classes is explicitly identified as future work (Harrasse et al., 17 Mar 2025).
6. Impact and Research Significance
Native temporal TiSQL has advanced the declarative management of temporal facts, rendering time a first-class concept in SQL. It catalyzed the formalization of temporal operators, enabling concise point- and interval-based queries—critical in financial, legal, and scientific temporal datasets (Mkaouar et al., 2011).
TiInsight’s TiSQL text-to-SQL methodology represents the operationalization of LLMs in real-world EDA workflows, abstracting away schema complexity and facilitating domain-expert data exploration. Its deployment shows LLMs’ strengths when combined with domain-aware prompt engineering and iterative refinement (Zhu et al., 14 Jan 2026).
TinySQL is pioneering as an interpretability testbed, providing replicable, progressive graded challenge datasets to correlate transformer subcircuits with explicit SQL reasoning tasks. This work demarcates a new experimental regime in mechanistic interpretability and offers a blueprint for future synthetic benchmarks (Harrasse et al., 17 Mar 2025).
The term “TiSQL” therefore encompasses architectures, prompting paradigms, and datasets that each push the evolution of SQL-centric analytics in their respective dimensions: language, automation, and scientific introspection.