Spatio-Temporal Graph Anomaly Benchmark

Updated 24 January 2026

Spatio-Temporal Graph Anomaly Benchmark is a testbed for detecting anomalies in maritime traffic using dynamic spatio-temporal graphs.
It converts raw AIS vessel trajectories into graph structures with node, edge, and graph-level annotations to capture varied anomaly scales.
LLM-driven agents synthesize realistic anomalies, and baseline models demonstrate improved detection using combined temporal and spatial features.

The Spatio-Temporal Graph Anomaly Benchmark is a structured testbed for evaluating anomaly detection methods in maritime traffic scenarios characterized by sparse, irregular, and non-grid spatio-temporal structures. It extends the Open Maritime Traffic Analysis Dataset (OMTAD) and introduces graph-centric representations and ground-truth anomaly annotations at multiple granularities, facilitating rigorous assessment of node-level, edge-level, and graph-level anomaly detection approaches in complex, real-world, non-grid environments (Kim et al., 23 Dec 2025).

1. Construction of the Spatio-Temporal Graph Benchmark

The foundation of the benchmark is the transformation of raw AIS (Automatic Identification System) vessel trajectories into dynamic spatio-temporal graphs. Each input trajectory is defined as

$\bigl\{T_i = \bigl(x_{i,1},x_{i,2},\dots,x_{i,w}\bigr)\bigr\}_{i=1}^{N}$

where $x_{i,t} = (\text{MMSI}_i,\, t,\, \mathrm{lat}_{i,t},\, \mathrm{lon}_{i,t},\, \mathrm{SOG}_{i,t},\, \mathrm{COG}_{i,t})$ comprises vessel identity (MMSI), temporal index, geolocation, speed over ground (SOG), and course over ground (COG), across a window of length $w$ .

These trajectories are mapped to a spatio-temporal graph: $\mathcal{G} = (V, E)$ with nodes $V = \{v_i^t\}$ , each representing a vessel at a particular timestamp, and edges: $E = E_{\mathrm{spatial}} \cup E_{\mathrm{temporal}}$ Spatial edges $E_{\mathrm{spatial}}$ connect nodes within a distance threshold or through OPTICS clustering, encoding local proximity: $E_{\mathrm{spatial}} = \bigl\{(v_i^t,v_j^t)\mid \mathrm{dist}(x_{i,t}, x_{j,t}) < \delta\bigr\}$ Temporal edges $E_{\mathrm{temporal}}$ link consecutive states of each vessel: $E_{\mathrm{temporal}} = \bigl\{(v_i^{t-1}, v_i^t)\mid i=1,\ldots,N;\, t=2,\ldots,w\bigr\}$

Anomalies are considered at three granularities:

Node-level: Set $A_{\mathrm{node}}\subseteq V$ (per-vessel state anomalies).
Edge-level: Set $A_{\mathrm{edge}}\subseteq E$ (anomalous vessel interactions).
Graph-level: Set $A_{\mathrm{graph}} \subseteq \mathcal{G}$ (whole-snapshot anomalies).

Formally, the benchmark defines:

Node anomaly detection as $f_{\mathrm{node}}: V \to \{0,1\}$ ,
Edge anomaly detection as $f_{\mathrm{edge}}: E \to \{0,1\}$ ,
Graph anomaly detection as $f_{\mathrm{graph}}: \mathcal{G} \to \{0,1\}$ (Kim et al., 23 Dec 2025).

2. Dataset Composition and Feature Engineering

The underlying data is sourced from OMTAD, covering 2018–2020 in Western Australia. It comprises $N=19,124$ trajectories (Cargo: 14,384; Tanker: 4,020; Fishing: 466; Passenger: 254). Spatio-temporal graphs are generated over sliding time windows (e.g., $w=12$ hours), each with $k$ vessels per graph.

Key statistics:

Number of graph snapshots: $M\approx\frac{N\times(\text{time span in hours})}{\text{stride}}$
Each graph: $|V|=k\times w$ , $|E_{\mathrm{spatial}}|\approx w \binom{k}{2}$ (dense clustering), $|E_{\mathrm{temporal}}| = k\times (w-1)$

Features:

Nodes: MMSI (embedded), timestamp, latitude, longitude, SOG, COG, ΔSOG/Δt, ΔCOG/Δt, wind/wave/current bins, visibility proxy ( $d_v\approx12$ –14).
Edges (optional): Distance, relative bearing, ΔSOG, ΔCOG ( $d_e\approx3$ –5).
Preprocessing: continuous features standardized, angular features as $\sin/\cos$ , cyclical encoding for time.

3. Anomaly Synthesis Methodology

Two specialized agents leveraging LLMs structure the anomaly generation protocol:

Trajectory Synthesizer: Densifies sparse subgraphs by generating “virtual neighbors” through perturbation of the focal vessel’s kinematic parameters (latitude, longitude, SOG, COG), or by allocating real nearby vessels when available.
Anomaly Injector: Converts high-level anomaly semantics (e.g., “sudden evasive zig-zag,” “risky rendezvous,” “group loitering”) into explicit graph modifications with corresponding labels at the node ( $z_v \in \{0,1\}$ ), edge ( $z_e \in \{0,1\}$ ), and graph ( $y_G \in \{0,1\}$ ) levels.

Types of Injected Anomalies

Node-level (kinematic): Speed spikes or drops, implemented as

$a_i^* = \mu_a + k\,\sigma_a,\quad \omega_i^* = \mu_\omega + k\,\sigma_\omega\quad (k>3)$

for acceleration and turn rate.

Edge-level (interaction): Near-collision events (forces trajectories to cross within distance $\epsilon$ ), pursuit/evasion via relative heading manipulation.
Graph-level (group): Loitering (clusters reduce speed below threshold for extended period), convoy deviation (subgroups diverge from main routes).

4. Task Definition and Evaluation Procedure

The benchmark defines (semi-)supervised binary classification tasks for all three anomaly granularities. Typical data splits are 70% training, 10% validation, and 20% test by trajectories.

Loss Functions:

Graph-level uses binary cross entropy:

$\mathcal{L}(\theta) = -\sum_{i=1}^{N_{\mathrm{train}}}\left[y_i\log p_\theta(y_i=1|G_i) + (1-y_i)\log p_\theta(y_i=0|G_i)\right]$

Analogous definitions apply at node and edge levels.

Evaluation Metrics:

For each granularity:

Precision: $\text{Precision} = \frac{\text{TP}}{\text{TP} + \text{FP}}$
Recall: $\text{Recall} = \frac{\text{TP}}{\text{TP} + \text{FN}}$
F $_1$ -score: $F_1 = 2\, \frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}}$
AUC: $\mathrm{AUC} = \int_0^1 \mathrm{TPR}(t)\, d(\mathrm{FPR}(t))$

5. Baseline Methods and Empirical Results

A suite of baseline models is evaluated on the benchmark:

Temporal baselines: LSTM (on individual trajectories) and Transformer (self-attention over time).
Spatio-temporal GNNs (ST-GNN):
- LSTM+GNN (temporal LSTM + per-step graph convolution)
- Transformer+GNN (self-attention in time + spatial graph convolution)
- Standard STGCN, DCRNN, Graph WaveNet.

Table: Node, Edge, and Graph-Level AUC Performance

Model	Node AUC	Edge AUC	Graph AUC (r_traj=0.1)	Graph AUC (r_traj=0.5)
Transformer	0.68	0.70	0.62	0.68
LSTM	0.65	0.67	0.60	0.65
LSTM+GNN	0.82	0.84	0.78	0.85
Transformer+GNN	0.85	0.87	0.81	0.88

Key findings:

Including spatial graph structure (+GNN) yields large improvements (10–25 points AUC) at all granularities.
The Transformer+GNN hybrid consistently achieves the highest AUC across tasks, demonstrating complementarity between temporal self-attention and spatial graph convolution.
Anomaly detection appears more tractable at the graph level (where aggregation mitigates noise) than at node or edge granularity (Kim et al., 23 Dec 2025).

6. Significance and Outlook

The Spatio-Temporal Graph Anomaly Benchmark systematically addresses challenges of anomaly detection in maritime environments lacking fixed spatial anchors and exhibiting varied anomaly scales. By repurposing OMTAD as a multi-granularity anomaly detection benchmark, augmenting with LLM-driven synthesizers and injectors, and establishing rigorous task definitions, it provides a reproducible platform for advancing graph-based methodologies in non-grid, dynamic domains. Preliminary baselines already demonstrate that robust exploitation of spatio-temporal relational structure can substantially improve detection capability—an assertion directly substantiated by AUC results. The benchmark framework invites further innovation in modeling, unsupervised detection, and dynamic graph learning tailored to highly irregular spatio-temporal systems (Kim et al., 23 Dec 2025).

Markdown Report Issue Upgrade to Chat

References (1)

Spatio-Temporal Graphs Beyond Grids: Benchmark for Maritime Anomaly Detection (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Spatio-Temporal Graph Anomaly Benchmark.