Multi-Graph & Semantic-Temporal Decoupling

Updated 8 February 2026

Multi-Graph and Semantic-Temporal-Entity Decoupling frameworks separate distinct graph modalities to enhance prediction accuracy and model interpretability.
They use modality-specific encoders and alignment modules to disentangle structural, temporal, and semantic features, mitigating information conflation.
Empirical studies report significant improvements in metrics like MRR and AUC across temporal knowledge graphs, dynamic attributed graphs, and agentic memory systems.

Multi-graph and semantic-temporal-entity decoupling frameworks address the challenge of modeling complex, dynamic graph-structured data where multiple modalities—such as structural, temporal, and semantic (textual or attribute-based) signals—interact in distributed or orthogonal latent spaces. By explicitly disentangling these aspects at the architectural level, modern models enable more robust reasoning, mitigate information conflation, and unlock superior predictive accuracy on temporal knowledge graph extrapolation, dynamic attributed graphs, and agentic memory systems.

A multi-graph paradigm constructs separate but interrelated graph views or channels, each capturing a distinct modality or semantic. In Dynamic Text-Attributed Graphs (DyTAGs), as defined by MoMent, the graph is formalized as

$G = (V, E, T, \mathcal{D}, \mathcal{R}, \mathcal{L})$

where $V$ are nodes, $E$ edges ( $E \subseteq V \times V$ ), $T$ is a totally ordered timestamp set, $\mathcal{D}$ node textual descriptions, $\mathcal{R}$ free-text edge attributes, and $\mathcal{L}$ edge categories. Each edge is a quintuple $(u, v, t_{uv}, r_{uv}, l_{uv})$ (Xu et al., 27 Feb 2025).

This approach generalizes to agentic memory (MAGMA), which instantiates distinct graphs on the same event-node set:

Semantic graph $G_s$ (edges for conceptual similarity),
Temporal graph $G_t$ (edges for chronological ordering),
Causal graph $G_c$ (edges inferred via conditional plausibility),
Entity graph $G_e$ (event-to-entity incidence) (Jiang et al., 6 Jan 2026).

Temporal Knowledge Graph models (LMS, DiMNet, GTRL) similarly construct multiple latent graphs—the entity-entity structural graph at each timestamp, cross-timestamp correlation modules, and abstract graphs for semantic periodicity or entity groups (Zhang et al., 2023, Dong et al., 20 May 2025, Tang et al., 2023).

2. Decoupling Methodologies: Node- and View-Centric Encoders

Decoupling is operationalized via non-shared, modality-specific encoders, allowing each channel to extract information optimized for its modality. MoMent employs:

Temporal encoders: Map event times to $\mathbb{R}^{d_t}$ via sinusoidal encoding $\phi(\tau)$ , pass through a 2-layer FFN, and aggregate over the timestamp sequence using multi-head self-attention (Xu et al., 27 Feb 2025).
Semantic encoders: Project node text $d_i$ with a BERT-base PLM, then reduce to $\mathbb{R}^{d_s}$ via an MLP and aggregate with self-attention for global semantic context.
Structural encoders: Aggregate local neighbors using any dynamic graph model (e.g., TGAT), inputting edge text and edge time offset concatenated vectors.

Each encoder yields a modality-specific latent token, $z_i^{(t)}$ , $z_i^{(s)}$ , $z_i^{(g)}$ , which are kept distinct until late fusion. Similar separation exists in LMS (EGL for temporal-structural, UGL for semantic, TGL for timestamp semantics) and DiMNet (multi-hop, multi-span GNNs per timestamp plus active/stable semantic factors) (Zhang et al., 2023, Dong et al., 20 May 2025).

MAGMA strictly separates graph instantiation (memory representation) from graph traversal (retrieval policy $\pi$ ), enforcing architectural decoupling between what is encoded (nodes/edges in semantic, temporal, causal, entity views) and how information is retrieved or reasoned over (Jiang et al., 6 Jan 2026).

Decoupled representations risk mapping information to disjoint latent spaces, impeding information synthesis. Alignment modules are thus introduced:

MoMent applies a symmetric alignment loss,

$\mathcal{L}_{\text{align}} = \frac{1}{2} \| z^{(t)} - W z^{(s)} \|_2^2 + \frac{1}{2} \| z^{(s)} - W^T z^{(t)} \|_2^2,$

enforcing linear-mapping correspondence between temporal and semantic tokens. This is theoretically shown to bound the difference in conditional mutual information with respect to the target label, ensuring temporal-semantic consistency (Xu et al., 27 Feb 2025).

LMS fuses EGL (temporal-structural) and UGL (semantic, query-relevant) representations with a learned adaptive gate:

$GE_t = \sigma(W_7 \Theta_e) \odot E_t + (1 - \sigma(W_7 \Theta_e)) \odot UE_t$

These gates decouple and reweight contributions from different views per entity (Zhang et al., 2023).

DiMNet introduces an attention-based disentangler that separates node semantics into active factors (capturing semantic change) and stable factors (encoding semantic inertia). The gating mechanism then dynamically blends new and old features per span and timestamp (Dong et al., 20 May 2025).

All these systems conclude with late fusion, typically via an MLP adaptor over concatenated latent tokens, learning view/feature importance jointly with downstream tasks.

4. Semantic-Temporal-Entity Decoupling in Temporal Knowledge Graphs

Advanced TKG reasoning frameworks operationalize semantic-temporal-entity decoupling as follows:

LMS deploys three modules: EGL (structural, temporal evolution), UGL (semantic, cross-timestamp), and TGL (periodic timestamp semantics), with a per-entity adaptive gating fusion (Zhang et al., 2023).
GTRL forms groups of entities (using sparsemax soft assignments) to capture semantic proximity, then aggregates via hierarchical GCNs (group-level, then entity-level), plus a decay-aware GRU for temporal dependencies. Semantic grouping, structural message passing, and temporal encoding remain disentangled until event prediction (Tang et al., 2023).
DiMNet disentangles temporal (sequence of graphs), structural (multi-hop span GNNs), semantic (active factor for change), and entity (stable factor for persistence). A smoothness loss regularizes the stable component, ensuring it tracks slow-changing semantic subspaces (Dong et al., 20 May 2025).
MAGMA maps memory entries to orthogonal semantic, temporal, causal, and entity graphs, with policy-guided retrievals (beam search traversals) weighted by query intent for interpretable, flexibly aligned context construction. Memory encoding ("write") and retrieval ("read") logic are strictly separated (Jiang et al., 6 Jan 2026).

5. Theoretical Guarantees and Empirical Results

Adding, aligning, and adaptively fusing decoupled modalities mathematically increases information content:

MoMent's Theorem 1 states $I(Z;Y) \geq I(z_g;Y) + \beta \min\{ H(Y|z_g), H(z_t,z_s|z_g) \}$ for some $\beta \in (0,1]$ where $Z$ is the late-fused feature, showing additional modalities strictly boost mutual information with the task label (Xu et al., 27 Feb 2025).
In DiMNet, the design is motivated to avoid the conflation of stable and new semantics seen in previous RNN- or subgraph-based approaches. Disentanglement is directly regularized via the smoothness loss on stable factors (Dong et al., 20 May 2025).
Empirical gains are consistent: MoMent reports up to +33.62% improvement (AUC/AP, DyGFormer), LMS leads by up to +4.82% on ICEWS14s, and DiMNet achieves a 22.7% MRR boost, all relative to prior state-of-the-art baselines (Xu et al., 27 Feb 2025, Zhang et al., 2023, Dong et al., 20 May 2025). MAGMA achieves +18.6% LLM-as-judge score versus the nearest memory system on LoCoMo (Jiang et al., 6 Jan 2026).
Ablation studies in LMS confirm that dropping any modality/module or naive fusion materially reduces performance (up to −5.7% MRR) (Zhang et al., 2023).

6. Applications and Broader Impact

Multi-graph and semantic-temporal-entity decoupling impact a range of domains:

Temporal knowledge graph extrapolation and event reasoning (LMS, DiMNet, GTRL).
Dynamic text-attributed networks, including citation, transaction, and communication graphs (MoMent).
Long-context and agentic memory retrieval for LLMs and autonomous agents (MAGMA).
Any setting requiring interpretable reasoning paths, intent-aligned evidence selection, or long-horizon inference, leveraging multi-modal graph-structured data.

By structurally separating the sources of information and tailoring alignment/fusion strategies, these models deliver (1) higher predictive accuracy, (2) improved robustness to noisy or missing modalities, and (3) transparent, query-adaptive, and theoretically grounded mechanisms for multi-view learning. A plausible implication is that semantic-temporal-entity decoupling provides a principled route to scalable, interpretable, and flexible dynamic graph learning architectures across both symbolic and deep learning regimes.