Papers
Topics
Authors
Recent
Search
2000 character limit reached

TiSQL: Temporal, LLM, & Synthetic SQL Insights

Updated 21 January 2026
  • TiSQL is a multi-faceted concept that includes native temporal SQL extensions with robust support for temporal and bitemporal data, enabling succinct historical queries.
  • TiSQL also denotes an LLM-driven text-to-SQL component that converts clarified natural language into precise SQL using schema filtering and iterative self-refinement.
  • TiSQL encompasses synthetic datasets like TinySQL designed for mechanistic interpretability, offering controlled, graded challenges to benchmark SQL-generation models.

TiSQL refers to three distinct, technically unrelated systems and frameworks that share the same acronym but address different aspects of SQL technology: (1) TiSQL as the temporal SQL3 extension for native temporal and bitemporal database support (Mkaouar et al., 2011), (2) TiSQL as the text-to-SQL prompt-engineering component of the TiInsight LLM-powered EDA platform (Zhu et al., 14 Jan 2026), and (3) instances of TiSQL as synthetic datasets for interpretability research, such as “TinySQL” (Harrasse et al., 17 Mar 2025). The unifying element among these systems is the extension or facilitation of SQL for richer semantics, automation, or research, but each operates at a different layer of database and data science pipelines.

1. Native Temporal SQL Extensions: TiSQL in Temporal Databases

TiSQL, as introduced by Mkaouar, Bouaziz, and Moalla, denotes a native extension to SQL3 designed for seamless support of temporal and bitemporal data (Mkaouar et al., 2011). The TiSQL model augments the object-relational data model by allowing any table or attribute to carry Valid-Time (VT), Transaction-Time (TT), or both (bitemporal) stamps. These are implemented as lists of time intervals per attribute value, e.g., Person.Status:{value,[VT1,VT2,],[TT1,TT2,]}\text{Person.Status} : \{ \langle \text{value}, [\text{VT}_1, \text{VT}_2, \ldots], [\text{TT}_1, \text{TT}_2, \ldots] \rangle \}^*

TiSQL extends all classical relational operators (selection, projection, join, set, cartesian product) to their temporal analogues via dedicated keywords and syntax. Seven core temporal qualifiers (HISTORY, PAST, FUTURE, @, BETWEEN, WHEN, SINCE/BEFORE/AFTER) are introduced, which can be applied to query, join, or set operators. Additional bitemporal qualifiers (RETROACTIF, POSTACTIF, ERRONEOUS) refine transaction-time updates.

A table of key temporal operators and their semantics:

TiSQL Operator Formal Definition/Effect Supported Clause
HISTORY Filters tuples across entire VT SELECT, FROM, JOIN
PAST/FUTURE Restricts VT to (,now)(-\infty,\text{now}) or (now,+)(\text{now},+\infty) SELECT/WHERE
@ d (time-slice) Filters for validity at date dd SELECT, FROM
BETWEEN d1 AND d2 Intersects intervals with [d1,d2)[d_1, d_2) SELECT, FROM
RETROACTIF/POSTACTIF Restricts TT update domain FROM/JOIN
TAG ON/CORRECT Insert non-destructive/rectificatory updates UPDATE

These TiSQL constructs enable queries such as:

1
2
3
SELECT HISTORY Status
  FROM TEACHER
 WHERE TeacherNum = 123;
or
1
2
3
SELECT DeptCode, MAX(HISTORY Budget DECADE)
  FROM DEPARTMENT
 GROUP BY DeptCode;

Temporal updates (e.g., TAG ON, CORRECT) allow insertion and correction of historical facts in a non-destructive, transaction-time consistent manner, obviating the need for ad hoc period tables or procedural triggers.

Performance considerations are grounded in interval-algebra, recommending early temporal restriction pushdown, indexing strategies (B+-tree on temporal keys, interval-trees), and horizontal partitioning by time slices. No explicit algorithmic complexity bounds are provided, but empirical results suggest parity with well-optimized period-table systems, with native TiSQL syntax yielding more succinct query formulations (Mkaouar et al., 2011).

2. LLM-Based Text-to-SQL with TiSQL in TiInsight

In the TiInsight end-to-end Automated Exploratory Data Analysis (EDA) pipeline, TiSQL is the text-to-SQL transformation component that converts clarified and decomposed natural language sub-questions into executable SQL (Zhu et al., 14 Jan 2026). TiSQL’s design leverages off-the-shelf proprietary LLMs (e.g., GPT-4), with no custom fine-tuning or architecture changes.

The TiInsight TiSQL component operates as follows:

  1. Inputs: Receives a clarified or decomposed sub-question QQ^* and the Hierarchical Data Context (HDC), which encapsulates database, table, column, and relationship summaries.
  2. Two-stage Schema Filtering:
    • Coarse map: Vector-index lookups (e.g., cosine similarity between QQ^* and table summaries) select top-NN candidate tables.
    • Fine map: Partitioned table groups are sent in parallel to the LLM with CoT prompts to select precise tables and columns.
    • Reduce: Merges decisions to obtain the minimal schema subset required for SQL generation.
  3. Prompt-based SQL Generation: Constructs an LLM prompt containing chosen HDC fragments and QQ^*; uses chain-of-thought pattern to induce explicit generation of joins, filters, aggregates, and so forth.
  4. Self-refinement Loop:
    • EXPLAIN-refine: SQL is wrapped in an EXPLAIN statement; LLM consumes DBMS feedback to correct errors.
    • EXECUTE-refine: Runtime errors trigger another round of LLM corrections using error message context.
  5. Output: Returns valid, semantically correct SQL for downstream visualization (via TiChart) or tabular presentation.

Unlike neural text-to-SQL models with custom training, TiSQL as instantiated here is a framework atop LLM prompt engineering and iterative refinement. HDC injection is handled via prompt preamble; schema overflow is managed by the aforementioned filtering process to respect LLM context window limits.

No unique supervised loss, reward signal, or RL objective is introduced; TiSQL’s correctness arises entirely from the pre-trained LLM’s autoregressive language modeling performance. TiSQL’s operators state that it “significantly reduces schema-linking complexity” and qualitatively “achieves higher correctness in practice” than generic zero-shot prompting, but quantifiable metrics are deferred to future publications (Zhu et al., 14 Jan 2026).

Illustrative end-to-end transformation:

  • Input: "Identify the impact of Federal Reserve interest rate hikes."
  • Clarified into sub-questions via LLM agent and decomposed: retrieve time series for federal funds rate and CPI.
  • Each sub-question is mapped to relevant tables/columns, LLM generates a valid SQL, which is refined with EXPLAIN/EXECUTE feedback, and passed to visualization modules.

Limitations include potential prompt window overflow for extremely wide schemas, LLM hallucination of non-existent schema elements, and increased latency and API costs due to multiple LLM calls per user question. Proposed future extensions include lightweight LLM fine-tuning, learned rerankers for SQL refinement, and a multi-agent decomposition framework (Zhu et al., 14 Jan 2026).

3. Synthetic Text-to-SQL Datasets: TinySQL for Mechanistic Interpretability

TinySQL is a synthetic text-to-SQL dataset designed for mechanistic interpretability research, enabling controlled probing of LLMs’ internal circuits that underlie SQL generation (Harrasse et al., 17 Mar 2025). The dataset includes:

  • Construction: 300,000 examples (100,000 per subset), spanning single-table queries with 2–12 column schemas and escalating in SQL/natural language complexity.
  • Subsets: Three base (CS1, CS2, CS3) variants by SQL complexity, corresponding synonym-perturbed (CS1_Syn, etc.), and a free-form natural language subset (CS1_Nat), with full schema and query definitions per instance.
  • SQL Grammar: Each tier’s structure is specified by explicit context-free grammars, e.g.

CS1SELECT FieldList FROM Table\langle \text{CS1} \rangle \rightarrow \text{SELECT } \langle \text{FieldList}\rangle\ \text{FROM } \langle \text{Table}\rangle

CS3SELECT AggFunc(FieldList)[AS Alias] FROM Table [GROUP BY FieldList][ORDER BY FieldList Dir]\langle \text{CS3} \rangle \rightarrow \text{SELECT } \langle \text{AggFunc}\rangle(\langle \text{FieldList}\rangle)[\text{AS}\ \langle \text{Alias}\rangle]\ \text{FROM}\ \langle \text{Table}\rangle\ [\text{GROUP BY}\ \langle \text{FieldList}\rangle][\text{ORDER BY}\ \langle \text{FieldList}\rangle\ \langle \text{Dir}\rangle]

  • Benchmarking: Small models (\sim33M–1B params) trained on TinySQL achieve from \sim85% to over 98%98\% exact-match SQL accuracy on test tasks, providing a tunable challenge for interpretability research.

TinySQL is utilized for structured mechanistic analyses using methods such as activation patching, edge attribution patching, and sparse autoencoders. These techniques expose how transformer heads or MLPs are causally involved in encoding SQL subskills: SELECT–FROM identification, ORDER BY clause synthesis, etc. For instance, in small models, a minimal circuit of 10–15 heads suffices to preserve > ⁣85%>\!85\% accuracy on basic SQL (Harrasse et al., 17 Mar 2025).

A significant insight is that larger models diffuse circuit responsibility across MLPs and layers, while subskill identifiability is clearer in smaller models. Interpretability analyses on TinySQL are feeding back into dataset design, eliminating superficial cues that models might exploit.

4. Comparison and Positioning of TiSQL Frameworks

Although sharing the TiSQL moniker, the temporal SQL extension, the TiInsight text-to-SQL module, and TinySQL for interpretability operate at different logical layers.

Framework Primary Function Methodology Key Features
Temporal TiSQL (Mkaouar et al., 2011) Native temporal data & queries SQL3 extension, declarative VT/TT stamps, temporal ops, bitemporality
TiInsight TiSQL (Zhu et al., 14 Jan 2026) Natural language to SQL LLM prompting, schema filtering HDC, CoT prompts, self-refinement
TinySQL (Harrasse et al., 17 Mar 2025) Mechanistic interpretability of SQL-generating models Synthetic data, ablation, attribution Progressive grammatical complexity

The temporal TiSQL extension is closest to SQL/DBMS semantics, directly impacting how databases store and query temporal facts. TiInsight’s TiSQL is a meta-layer, acting as an intelligent translation interface for end-users, leveraging LLM capabilities without modifying underlying SQL. TinySQL is not an interface or extension, but a research artifact—neither an LLM nor a query engine, but an experimental scaffold to study and reverse-engineer model architectures in the context of SQL.

5. Limitations and Future Directions

For native temporal TiSQL (Mkaouar et al., 2011), practical limitations stem from schema bloat, index maintenance, and the increased algorithmic complexity of interval operations. Temporal pushdown and optimized interval indexing are recommended.

For TiInsight TiSQL (Zhu et al., 14 Jan 2026), the main technical risks are prompt length limits (mitigated via hierarchical data context filtering), hallucination (especially when HDC summaries are imperfect), and the latency imposed by serial LLM refinement. Scalability is manageable but not eliminated; the paper suggests future research in lightweight LLM fine-tuning and multi-agent decomposition approaches.

TinySQL’s limitation is its current focus on single-table queries, which, while ideal for fine-grained mechanistic interpretability, lacks the complexity of realistic multi-table, join-heavy database environments. Extension to richer query classes is explicitly identified as future work (Harrasse et al., 17 Mar 2025).

6. Impact and Research Significance

Native temporal TiSQL has advanced the declarative management of temporal facts, rendering time a first-class concept in SQL. It catalyzed the formalization of temporal operators, enabling concise point- and interval-based queries—critical in financial, legal, and scientific temporal datasets (Mkaouar et al., 2011).

TiInsight’s TiSQL text-to-SQL methodology represents the operationalization of LLMs in real-world EDA workflows, abstracting away schema complexity and facilitating domain-expert data exploration. Its deployment shows LLMs’ strengths when combined with domain-aware prompt engineering and iterative refinement (Zhu et al., 14 Jan 2026).

TinySQL is pioneering as an interpretability testbed, providing replicable, progressive graded challenge datasets to correlate transformer subcircuits with explicit SQL reasoning tasks. This work demarcates a new experimental regime in mechanistic interpretability and offers a blueprint for future synthetic benchmarks (Harrasse et al., 17 Mar 2025).

The term “TiSQL” therefore encompasses architectures, prompting paradigms, and datasets that each push the evolution of SQL-centric analytics in their respective dimensions: language, automation, and scientific introspection.

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to TiSQL.