Papers
Topics
Authors
Recent
Search
2000 character limit reached

An Ontology-Driven Graph RAG for Legal Norms: A Structural, Temporal, and Deterministic Approach

Published 29 Apr 2025 in cs.CL, cs.IR, and cs.AI | (2505.00039v5)

Abstract: Retrieval-Augmented Generation (RAG) systems in the legal domain face a critical challenge: standard, flat-text retrieval is blind to the hierarchical, diachronic, and causal structure of law, leading to anachronistic and unreliable answers. This paper introduces the Structure-Aware Temporal Graph RAG (SAT-Graph RAG), an ontology-driven framework designed to overcome these limitations by explicitly modeling the formal structure and diachronic nature of legal norms. We ground our knowledge graph in a formal, LRMoo-inspired model that distinguishes abstract legal Works from their versioned Expressions. We model temporal states as efficient aggregations that reuse the versioned expressions (CTVs) of unchanged components, and we reify legislative events as first-class Action nodes to make causality explicit and queryable. This structured backbone enables a unified, planner-guided query strategy that applies explicit policies to deterministically resolve complex requests for (i) point-in-time retrieval, (ii) hierarchical impact analysis, and (iii) auditable provenance reconstruction. Through a case study on the Brazilian Constitution, we demonstrate how this approach provides a verifiable, temporally-correct substrate for LLMs, enabling higher-order analytical capabilities while drastically reducing the risk of factual errors. The result is a practical framework for building more trustworthy and explainable legal AI systems.

Summary

  • The paper introduces SAT-Graph RAG which integrates a formal legal ontology with structural and temporal segmentation for deterministic retrieval.
  • It employs a multi-layered knowledge graph that models hierarchical components, language versions, and legislative actions for precise versioning.
  • The framework facilitates auditable provenance reconstruction and efficient impact analysis, as demonstrated on the Brazilian Constitution.

Introduction

The paper presents the Structure-Aware Temporal Graph RAG (SAT-Graph RAG), an ontology-driven framework for Retrieval-Augmented Generation (RAG) in the legal domain. It addresses the limitations of standard RAG systems, which are typically temporally-naïve and structurally flat, by explicitly modeling the hierarchical, diachronic, and causal structure of legal norms. The approach is grounded in a formal, LRMoo-inspired ontology, enabling deterministic, auditable, and context-rich retrieval for legal AI applications.

Formal Ontological Model and Graph Construction

The SAT-Graph RAG framework is built upon a multi-layered ontological model that distinguishes between abstract legal Works, their hierarchical Components, Temporal Versions (CTVs), and Language Versions (CLVs). This separation allows for precise modeling of the evolution of legal texts over time and across languages.

The graph construction process begins with structure-aware semantic segmentation of legal texts, mapping each segment to its corresponding hierarchical component (e.g., Title, Chapter, Article). These components are instantiated as nodes in the knowledge graph, forming a backbone that mirrors the formal structure of the legal document. Figure 1

Figure 1: Example of articulated text for Art. 12 of the Federal Constitution of Brazil (1988) with annotations indicating the types of hierarchical provisions/components.

Figure 2

Figure 2: Hierarchical semantic segmentation and typification of structural entities applied to a passage of the Brazilian Constitution, with Norms in green and Components in blue.

Textual content is linked to the most specific layer—Language Version nodes—ensuring that every retrievable text unit is unambiguously tied to both a semantic state and a linguistic expression. Figure 3

Figure 3: Multi-layered relationship in the graph: Norms have hierarchical Components, which have date-stamped Temporal Versions, which in turn have language-specific Language Versions. Text Chunks are linked to the CLVs.

The model elegantly supports multilingual corpora by associating new Language Versions with pre-existing Temporal Versions, avoiding duplication of structural information. Figure 4

Figure 4: Representation of multilingual content (Portuguese and English) linked to the same temporal and structural backbone.

Temporal Aggregation and Efficient Versioning

A key innovation is the aggregation model for propagating changes. When a component is amended, only the affected child receives a new Temporal Version; parent components aggregate the latest available versions of their children, reusing unchanged CTVs. This avoids redundancy and enables efficient, deterministic reconstruction of the law's state at any point in time. Figure 5

Figure 5: New Temporal Versions of "tit2" (Title II) derived from new CTVs of some children, with unchanged child components reusing their most recent CTVs.

Figure 6

Figure 6: Aggregation relationships between Temporal Versions, showing reuse of child CTVs by multiple parent CTVs at different times.

Causality and Metadata as First-Class Entities

Legislative events (amendments, repeals, enactments) are modeled as Action nodes, making causality explicit and queryable. Each Action node is associated with a descriptive Text Unit, enabling semantic search over legislative history and provenance. Figure 7

Figure 7: Legislative Action in the knowledge graph, showing how an amendment terminates the validity of an old CTV and produces a new one.

Structured metadata and informative relationships are also textualized into dedicated Text Units, supporting multi-aspect retrieval and enabling queries over both content and context. Figure 8

Figure 8: Knowledge graph illustrating Text Units derived from Language Versions (content) and other entities (Norm, Component, Temporal Version, Action) representing metadata and relationships.

Structure-Aware and Thematic Retrieval

The framework leverages curated communities intrinsic to legal documents: internal hierarchy (structural communities) and external thematic classification (topical communities). Theme nodes group Norms and Components by legal topics, each with a human-authored description, enabling cross-document, topically-coherent retrieval. Figure 9

Figure 9: Inter-norm and component aggregation by legal Theme entities, representing higher-level communities in the knowledge graph.

Users can select a scope (Theme, Norm, Component, or Version) to filter retrieval, transforming search from a flat corpus-wide operation to semantic navigation within a contextually relevant subgraph. Figure 10

Figure 10: User selection of scope (Theme, Norm, Component, or Version) to filter retrieval of relevant Text Units from the knowledge graph.

Case Study: Brazilian Constitution

The framework is demonstrated on the Brazilian Federal Constitution of 1988, which has undergone extensive amendment. The case study illustrates three critical capabilities:

  1. Deterministic Point-in-Time Retrieval: The system retrieves the exact version of a provision valid on any historical date, using a planner-guided query strategy that canonicalizes structural and temporal constraints, traverses the graph to select valid CTVs, and retrieves the corresponding CLVs.
  2. Hierarchical Impact Analysis: The system aggregates legislative changes across structural sections (e.g., all amendments to Chapter II after 2010), leveraging the explicit hierarchy and Action nodes to produce structured summaries.
  3. Auditable Provenance Reconstruction: The system traces the full causal lineage of textual changes, assembling ordered chains of Action nodes and providing machine-readable provenance reports. Figure 11

    Figure 11: Original Version and subsequent Versions of Article 6 of the Brazilian Constitution generated by three Constitutional Amendments.

Unified, Deterministic Query Execution

A modular, planner-guided execution strategy supports all query patterns, centralizing constraint extraction, scope resolution, strategy selection, deterministic CTV selection, and fact-grounded generation. Operational defaults (embedding model, similarity function, temporal policy) are disclosed with each response, ensuring auditability and reproducibility.

Discussion: Implications, Scalability, and Evaluation

The SAT-Graph RAG framework offers deterministic, explainable retrieval for legal AI, supporting high-stakes applications where precision and auditability are paramount. Its scalability depends on principled data curation, incremental processing, and robust validation workflows. The approach generalizes to other domains with explicit structure and versioning, such as contracts and technical documentation.

Quantitative evaluation requires dedicated benchmarks measuring temporal precision, action-attribution accuracy, causal-chain completeness, and user-centered metrics. The development of annotated testbeds is advocated to enable reproducible comparison of temporally-aware legal retrieval systems.

Ethical deployment mandates transparency of internal policies, equitable access to high-quality legal data, and immutable audit logs for all graph updates.

Conclusion

The SAT-Graph RAG framework advances legal AI by integrating formal document structure, temporal versioning, and explicit causality into a unified, ontology-driven knowledge graph. It enables deterministic, auditable retrieval and higher-order analytical capabilities, addressing the critical limitations of standard RAG systems in the legal domain. The approach lays a robust foundation for trustworthy, explainable legal AI, with future work focused on formal ontology publication, benchmark development, and extension to other domains and hybrid models.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Authors (1)

Collections

Sign up for free to add this paper to one or more collections.