Dynamic Event Units in DyG-RAG
- The paper introduces DEUs as minimal, time-anchored factual units that enable interpretable and efficient multi-hop temporal reasoning.
- The extraction pipeline uses document chunking, temporal parsing, and information filtering to accurately detect and encode events from text.
- The dynamic event graph constructs weighted links over DEUs based on entity co-occurrence and temporal proximity, enhancing timeline retrieval.
Dynamic Event Units (DEUs) represent the core abstraction for temporal reasoning in DyG-RAG, a dynamic graph retrieval-augmented generation framework. DEUs provide a time-anchored, semantically precise unit for representing events or states extracted from unstructured text, overcoming the temporal ambiguity of traditional retrieval units. By encoding both factual and temporal attributes and linking these units into an event-centric dynamic graph, DyG-RAG supports interpretable, faithful retrieval and complex multi-hop temporal reasoning.
1. Formal Definition and Semantic–Temporal Encoding
A Dynamic Event Unit (DEU) in DyG-RAG is defined as the minimal, self-contained factual “factoid” that is explicitly anchored in time and suitable for direct retrieval:
where:
- : factual sentence describing an event or state
- : normalized timestamp (date or interval anchor)
- : unique identifier for the event
- : identifier for the source document or chunk
Each DEU is encoded into a joint semantic–temporal vector:
where and , with a Fourier‐basis time encoder that smoothly embeds absolute dates sensitive to their relative distances (Sun et al., 16 Jul 2025).
2. DEU Extraction Pipeline
The extraction of DEUs from raw documents follows a rigorous multi-stage process:
- Document Chunking: Documents are divided into overlapping chunks (default $1200$ tokens, overlap $64$), with each chunk prepended by the document title for context preservation.
- Temporal Parsing: Absolute dates are detected and normalized as . Relative or vague expressions are resolved by back-referencing the most recent absolute date in context. Intervals are indexed by start; absence of anchors results in a static placeholder.
- Information Filtering: A scoring function filters sentences:
- : contains named entity
- : eventive predicate present
- : includes results or quantifiers
- : anchored at month-level precision Sentences with are retained.
- Sentence Selection & Merging: Coreference resolution replaces pronouns. Sentences are merged only if predicates refer to the same and describe a tightly coherent mini-narrative. No merging occurs across differing day-level timestamps.
The result is a collection precisely indexed and time-anchored (Sun et al., 16 Jul 2025).
3. Dynamic Event Graph Construction
DEUs are structured into an undirected, weighted Event Graph :
| Component | Description |
|---|---|
| Node () | Represents , embedding stored in a vector index (e.g., NanoVectorDB). |
| Edge Construction | Edge exists iff: (a) (entity co-occurrence) <br> (b) (temporal proximity, tunable, default $1$ year). |
| Edge Weighting | , with Jaccard over entity sets and temporal decay. |
| Graph Sparsity (K-NN) | Top-K edges per node kept by (K 16–32), stored in NetworkX, vectors in the vector DB. |
Semantic–temporal linking ensures context integrity and efficient multi-hop temporal reasoning (Sun et al., 16 Jul 2025).
4. Timeline Retrieval and Time Chain-of-Thought (Time-CoT)
The query workflow utilizes DEUs for temporally-precise retrieval and reasoning:
- Query Embedding: Temporal intent parsed, forming joint query vector for .
- Seed Retrieval: Top-N DEUs retrieved by cosine; reranked with a cross-encoder.
- Time-Aware Graph Traversal: Random walks of length from each seed, sampling next hop in proportion to edge weights, yielding event-paths .
- Timeline Assembly: Events split by timestamped/static, then chronologically ordered; biographical/stateless facts appended last.
- Time-CoT Prompting: LLM receives structured timeline and a temporal reasoning template demanding stepwise inference:
- Identify query-scoped events
- Resolve pairwise event order/overlap
- Trace continuity vs. instantaneous states
- Heuristic reasoning by question type
- Cross-reference event citations in answers
This design provides temporally grounded, interpretable responses for complex information needs (Sun et al., 16 Jul 2025).
5. Illustrative Examples and Case Studies
Several cases exemplify the interpretative and retrieval capacity of DEUs:
- Implicit Inference (TimeQA): For "Which team did Bruno Pereirinha play for in May 2013?", DEUs span his club movements; entity and temporal graph linkage enable the system to infer (via random walk and timeline) that in May 2013, he was still affiliated with S.S. Lazio.
- Event State Grounding (TempReason): For position-based queries ("Which position did Anne-Marie Descôtes hold in March 2012?"), DEUs indexing position intervals allow the retrieval pipeline to accurately resolve overlapping roles by time.
- Multi-Hop Reasoning (ComplexTR): For queries requiring chained inference (e.g., career progression across years), the event graph supports pathfinding across temporally and semantically linked DEUs, reconstructing multi-step temporal narratives (Sun et al., 16 Jul 2025).
The examples demonstrate DEUs’ flexibility: tightly scoped unit selection, explicit time encoding, and graph-based path exploration underpin accurate, explainable temporal question answering.
6. Significance and Implications in Temporal Reasoning
DEUs introduce minimal but explicit granularity for event and state representation, addressing a key limitation in prior retrieval-augmented generation frameworks—namely, the inability to capture event ordering, state continuity, and time-sensitive relationships. Their explicit construction and organization as nodes in a dynamic temporal graph support:
- Elimination of temporal ambiguity in retrieval
- Precise, interpretable multi-hop reasoning across event chains
- Efficient subgraph extraction in demanding temporal queries
A plausible implication is the broader applicability of the DEU abstraction to domains such as automated historiography, biographical reasoning, and longitudinal knowledge graph construction, wherever fine-grained temporal event tracking is essential (Sun et al., 16 Jul 2025).
7. Workflow Summary and Pseudocode
The practical procedure for DEU extraction and graph population is succinctly captured as:
1 2 3 4 5 6 7 8 9 10 11 12 |
for each document d in D: chunks = sliding_window(d, chunk_size=1200, overlap=64) for each chunk: cand_events = LLM.extract_event_sentences(chunk) for s in cand_events: t = normalize_time(s) if info_score(s) >= 1: s_prime = coref_resolve(s) if s_prime.timestamp == previous.timestamp: merge_predicates(s_prime, previous_DEU) else: emit DEU = {s_prime, t, new_event_id, d.id} |
This workflow, comprising event detection, temporal normalization, content filtering, and graph construction, forms the core pipeline enabling DyG-RAG’s temporally grounded retrieval-augmented generation (Sun et al., 16 Jul 2025).