Papers
Topics
Authors
Recent
Search
2000 character limit reached

Warded Bound Datalog: Core of Vadalog

Updated 16 January 2026
  • Warded Bound Datalog is a strict fragment of existential Datalog combining wardedness and piece-wise linearity to control null propagation and recursive joins.
  • It achieves NLOGSPACE data complexity and efficient bounded-memory reasoning, ensuring practical scalability for recursive ontological queries.
  • Implemented in the Vadalog system, its streaming operator network and guide structures optimize derivation and guarantee decidability.

Warded Bound Datalog (the piece-wise linear fragment of warded Datalog±) is a strict syntactic fragment of existential Datalog—based on tuple-generating dependencies (TGDs)—that admits tractable and space-efficient reasoning for recursive knowledge graph and ontological queries. It emerges as the logical core of the Vadalog system and combines two independent restrictions: wardedness (controlling existential null propagation) and piece-wise linearity (limiting mutually recursive joins in rule bodies). This fragment achieves NLOGSPACE-complete data complexity, bounded space utilization, and expressiveness exceeding piece-wise linear Datalog, while retaining decidability—a conjunction that is not achievable if either restriction is dropped (Berger et al., 2018).

1. Formal Preliminaries: TGDs, Wardedness, and Piece-wise Linearity

A TGD (tuple-generating dependency) in Datalog± is a first-order logic rule of the form:

xˉyˉ  (ϕ(xˉ,yˉ)    zˉ  ψ(xˉ,zˉ))\forall\bar{x}\,\forall\bar{y}\;\bigl(\phi(\bar{x}, \bar{y}) \;\rightarrow\; \exists\bar{z}\;\psi(\bar{x}, \bar{z})\bigr)

where ϕ\phi (the body) and ψ\psi (the head) are conjunctions of atoms over variables.

  • Frontier variables are those occurring in both body and head: front(σ)=var(ϕ)var(ψ)front(\sigma) = var(\phi) \cap var(\psi).
  • Existential variables zˉ\bar{z} appear only in the head.

Wardedness governs how existential nulls (labeled nulls) can appear in the derivation (the “chase”):

  • Affected positions are argument positions that may ever receive a null, inductively defined by: any position with an existential variable in a head is affected, and frontier variables that appear only in affected positions propagate affectedness further.
  • In a TGD, a body variable is dangerous if it is in the frontier and all its occurrences are in affected positions; harmless otherwise.
  • A TGD is warded if all dangerous variables, if present, are confined to a single body atom (the ward), and any variable shared between the ward atom and the rest of the body is harmless.
  • The class of all finite sets of warded TGDs is denoted WARD.

Piece-wise linearity generalizes classic linear Datalog:

  • For a set of TGDs Σ\Sigma, consider the predicate graph PG(Σ)PG(\Sigma) with predicates as nodes; an edge PQP \to Q is present if some rule has PP in its body and QQ in its head.
  • Predicates PP and QQ are mutually recursive if they occur on a common (strongly connected) component in PG(Σ)PG(\Sigma).
  • Σ\Sigma is piece-wise linear (PWL) if in every rule, at most one body atom’s predicate is mutually recursive with any predicate in the head.
  • The intersection WARD \cap PWL defines the fragment of warded piece-wise linear TGDs.

2. Decidability and Complexity Bounds

While classical existential Datalog fragments suffer undecidability when allowing too much recursion or uncontrolled null propagation, Warded Bound Datalog (WARD\capPWL) attains both decidability and outstanding tractability:

  • Undecidability of PWL Alone:

CQ answering under PWL sets of TGDs is undecidable. This follows via reduction from the unbounded tiling problem: even single-TGD recursion with PWL permits simulation of arbitrary tiling, breaking decidability (Berger et al., 2018).

  • WARD∩PWL Complexity:
    • Data Complexity: NLOGSPACE-complete; the space required is O(logD)O(\log|D|) for database DD and polynomial in Σ +q|\,\Sigma\ |+|q|, where qq is the query.
    • Combined Complexity: PSPACE-complete.
  • Comparison:
    • General warded TGDs (WARD) yield PTIME-complete data complexity and ExpTime-complete combined complexity.
    • Piece-wise linearity without wardedness is strictly less desirable, as it gives undecidable reasoning.

3. Expressiveness Relations

The expressive strength of Warded Bound Datalog is captured via two axes: combined expressive power (query as black-box) and program expressive power (rule set alone).

Language/Class Combined Expressive Power (cep) Program Expressive Power (pep)
Piece-wise linear Datalog (PWL-DATALOG) = WARD∩PWL << WARD∩PWL
Plain Datalog = WARD << WARD
  • Combined Power:

PWL-DATALOG and WARD∩PWL are equally powerful as “queries”: any piece-wise linear warded query can be unfolded into a piece-wise linear full Datalog query, and vice versa.

  • Program Power:

WARD∩PWL strictly extends PWL-DATALOG due to its ability to invent new nulls (witnesses), thus supporting ontological patterns beyond plain recursion. The full warded fragment (WARD) strictly extends plain Datalog by supporting existential rules in addition to full rules.

4. Algorithmic Insights and Boundedness in Vadalog

The Vadalog system, which serves as the primary implementation of these theoretical results, employs algorithmic structures that exploit the two-fold restriction:

  • Guide Structures:

Linear forests, warded forests, and lifted linear forests are constructed to model the bounded propagation of atoms and nulls during chase, enabling effective pruning of repeated or isomorphic derivations.

  • Streaming Operator Network:
    • Rank vectors attached to facts ascend monotonically, counting existential null introductions, and prune redundant derivations.
    • Per-rule memoization ensures each rule application per binding occurs at most once.
  • Termination and NLOGSPACE Guarantee:

Owing to the WARD∩PWL restriction, every branch of query answering can be realized in O(logD)O(\log|D|) space, with per-branch working memory never exceeding this logarithmic bound (Berger et al., 2018). The overall number of distinct facts and chase steps is polynomial in D|D|, guaranteeing completion. Windowed streaming architectures and volcano-iterator patterns, as realized in the Vadalog engine, maintain an explicitly bounded memory footprint even on massive real-world datasets (Baldazzi et al., 2023).

5. Illustrative Example and Fragment Verification

Consider an OWL 2 QL ontology encoded using the following TGDs:

  1. SubClass(x,y)SubClass(x,y)\mathit{SubClass}(x,y) \to \mathit{SubClass}^*(x,y)
  2. SubClass(x,y),SubClass(y,z)SubClass(x,z)\mathit{SubClass}^*(x,y), \mathit{SubClass}(y,z) \to \mathit{SubClass}^*(x,z)
  3. Type(x,y),SubClass(y,z)Type(x,z)\mathit{Type}(x,y), \mathit{SubClass}^*(y,z) \to \mathit{Type}(x,z)
  4. Type(x,y),Restriction(y,p)wTriple(x,p,w)\mathit{Type}(x,y), \mathit{Restriction}(y,p) \to \exists w\,\mathit{Triple}(x,p,w)
  5. Triple(x,p,y),Inverse(p,q)Triple(y,q,x)\mathit{Triple}(x,p,y), \mathit{Inverse}(p,q) \to \mathit{Triple}(y,q,x)
  6. Triple(x,p,y),Restriction(q,p)Type(x,q)\mathit{Triple}(x,p,y), \mathit{Restriction}(q,p) \to \mathit{Type}(x,q)

This rule set is warded: dangerous variables, when present, appear only together in one body atom per rule, and their sharing is harmless. It is not linear since some rules (such as 2, 3, 5, and 6) feature multiple intensional (recursive) body atoms, but crucially, in each rule, at most one body atom’s predicate is mutually recursive with the head’s predicate (piece-wise linearity). Thus, the set lies in WARD∩PWL (Berger et al., 2018).

Within Vadalog, query evaluation over such rules leverages guide structures and operator pipeline optimizations to ensure that only new, non-redundant facts are considered, and that all null propagation remains confined by wardedness, precluding infinite derivations (Baldazzi et al., 2023, Bellomarini et al., 2018).

6. Streaming-Based Reasoning and Practical Boundedness

Modern streaming-friendly chase variants implement the theoretical WARD∩PWL principles at scale:

  • Bounded Warded Chase employs incremental evaluation, fact windowing, and lightweight reference-counted evictions:
    • Each newly derived fact is queued, indexed, and matched only with rules for which it is relevant.
    • Once the in-memory window exceeds a programmatically determined bound (polynomial in D|D|), facts that cannot participate in future derivations are evicted (Baldazzi et al., 2023).
  • Window Ejection and Early Pruning:

By leveraging the local containment of all dangerous variables (the "warded atom"), the system detects derivational boundaries and prunes any further consideration of facts that cannot propagate new nulls or witness values.

  • Incremental and Pipelined Execution:

The entire query workload is decomposed into pipelined, volcano-style operator components, supporting highly parallelizable, low-memory reasoning suitable for large-scale industrial settings.

  • Implementation Outcomes:

Empirical evaluation on industrial-scale ontological reasoning showed that Vadalog, when restricted to warded (and especially warded+PWL) programs, consistently maintains a memory footprint of a few hundred megabytes—even on knowledge graphs with tens of millions of facts (Baldazzi et al., 2023). The combination of theoretical boundedness and pragmatic pipeline implementation realizes both scalability and tractable reasoning.

7. Summary and Significance

Warded Bound Datalog—formally, the intersection WARD∩PWL—constitutes the space-efficient, highly tractable fragment at the core of Vadalog’s reasoning capability. Its defining design is the confluence of:

  • Wardedness: controlling unbounded existential null propagation,
  • Piece-wise linear recursion: minimizing the risk of mutually recursive blow-ups in the chase,
  • Expressivity exceeding piece-wise linear Datalog, and
  • Decidability and NLOGSPACE data complexity, optimal for query-driven reasoning over large knowledge bases.

Theoretical guarantees are realized in production-grade systems via pipeline-oriented, window-bounded chase algorithms. As a consequence, Warded Bound Datalog enables both robust theoretical guarantees and scalable, practical reasoning for complex recursive ontological workloads (Berger et al., 2018, Bellomarini et al., 2018, Baldazzi et al., 2023).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Warded Bound Datalog.