Interpretable Vessel Trajectory Imputation (VISTA)

Updated 18 January 2026

The paper presents VISTA, a novel framework that uses structured data and LLM-generated insights to impute missing vessel trajectory segments with clear interpretability.
VISTA builds a structured knowledge graph linking vessel static attributes and behavior patterns to enable precise anomaly detection and effective route planning.
Experimental results on AIS datasets show VISTA achieves up to 94% accuracy improvements and significant computation time reduction over baseline methods.

Knowledge-driven interpretable vessel trajectory imputation (VISTA) is a framework designed to recover missing segments in vessel Automatic Identification System (AIS) data with a particular emphasis on interpretability and knowledge transfer. VISTA aims to address the deficiencies of conventional black-box or statistical imputation approaches by generating structured, human-interpretable cues that explicitly justify each reconstructed trajectory segment and facilitate downstream analysis, including anomaly detection and route planning (Liu et al., 11 Jan 2026).

1. Underlying Knowledge Foundations

VISTA operationalizes "underlying knowledge" as the core resource for trajectory recovery, explicitly defined as the fusion of two key components:

Structured Data-derived Knowledge (SDK, $\mathcal{K}_d$ ): This is distilled from historical AIS records and encodes empirical vessel properties, behavior patterns, imputation-method choices, and their co-occurrences.
Implicit LLM Knowledge ( $\mathcal{K}_\ell$ ): Commonsense maritime priors, navigation rules, and explanatory information acquired by LLMs pre-trained on extensive Internet corpora.

The unified knowledge construct is formalized as

$\mathcal{U} = \Phi(\mathcal{K}_d, \mathcal{K}_\ell)$

where $\Phi$ denotes the integration mechanism. SDK triggers relevant LLM queries, retrieving textual explanations that ground imputation decisions.

2. Structured Data-derived Knowledge Representation

2.1 Knowledge Extraction Pipeline

Each vessel trajectory $\mathbf{X}_\iota$ is partitioned into fixed-size minimal segments $\mathbf{S}_\iota^k$ of length $m$ with a binary mask $\mathbf{M}_\iota\in\{0,1\}^K$ distinguishing complete segments. For every complete segment ( $M_\iota^k=1$ ), a tripartite knowledge unit $(v_s,v_b,v_f)$ is constructed:

Static Attribute ( $v_s$ ): Vessel identifier $\iota$ , status $\eta$ , cargo type $\chi$ , draught $d$ , length $\ell$ , width $\beta$ , spatial context $\sigma$ , and type $\kappa$ .
Behavior Pattern ( $v_b$ ): Tuple $(p^s,p^\theta,p^\psi,p^i,p^\tau)$ encoding speed, course, heading, LLM-inferred intent, and duration.
Imputation Function ( $v_f$ ): An executable Python function $f$ and LLM-generated description $d(f)$ .

2.2 Knowledge Graph Construction

SDK instances are organized in a Structured Data-derived Knowledge Graph (SD-KG), $\mathcal{G}_d=(\mathcal{V}_d, \mathcal{E}_d)$ with three node types: $\mathcal{V}_d = \mathcal{V}_s \cup \mathcal{V}_b \cup \mathcal{V}_f$ Edges are defined as $\mathcal{E}_{sb}\subset \mathcal{V}_s\times\mathcal{V}_b$ and $\mathcal{E}_{bf}\subset \mathcal{V}_b\times\mathcal{V}_f$ ; edge weights $w_{sb}$ and $w_{bf}$ correspond to empirical co-occurrence counts in the training data. The adjacency matrices $A_{sb}\in\mathbb{N}^{|\mathcal{V}_s|\times|\mathcal{V}_b|}$ and $A_{bf}\in\mathbb{N}^{|\mathcal{V}_b|\times|\mathcal{V}_f|}$ encode these relationships.

3. Integration and Utilization of Implicit LLM Knowledge

A pretrained LLM (e.g., Qwen-plus, GLM-4.5-th) augments SDK by synthesizing domain priors—regulatory frameworks, operational heuristics, and narrative explanations. LLM knowledge remains implicit in the model weights and is retrieved via templated prompts using descriptors from SD-KG, producing a set of rationales and operational cues $\mathcal{K}_\ell$ aligned to SDK semantics. This yields a composite evidence base ( $\mathcal{U}$ ) for each imputation event, promoting human-understandable interpretability and regulatory traceability.

4. Data–Knowledge–Data Loop and Imputation Algorithms

4.1 SD-KG Construction

The initial phase partitions vessel data into segments and processes them serially, extracting static and behavioral context via LLM abstractions, generating imputation functions, and assembling the SD-KG with weighted edges reflecting data frequencies.

4.2 Knowledge-Driven Trajectory Imputation

For each gap segment $\mathbf{S}_\iota^k$ , the following steps are executed:

Context Extraction: Retrieve segment-boundary context ( $u_\iota^{-},u_\iota^{+}$ ), static attributes $\mathcal{V}_s^k(\iota)$ , and relevant behavioral patterns.
Behavior Estimation: Candidate behaviors $\mathcal{C}_b$ are extracted according to $w_{sb}$ statistics; each is scored via empirical priors:

$\pi(v_b) = \frac{\prod_{v\in \mathcal{V}_s^k(\iota)} (w_{sb}(v,v_b)+1)}{\sum_{v_b'\in\mathcal{C}_b} \prod_v (w_{sb}(v,v_b')+1)}$

Top- $K_b$ are shortlisted; an LLM selects the final behavior $v_b^*$ and generates explanation $\mathcal{J}_b$ .

Method Selection: Candidate imputation methods from $A_{bf}$ are similarly scored and narrowed to $v_f^*$ ; rationale $\mathcal{J}_f$ is provided.
Execution: The chosen method $f^*$ is applied deterministically.
Explanation Composer: Human-readable explanation $\mathcal{J}_h$ integrates regulatory, evidentiary, and operational context.

No gradient-based optimization is performed; selection hinges on maximizing data-grounded priors and kinematic consistency.

5. Workflow Management and Scalability

To efficiently operate at scale, VISTA incorporates a multi-layer workflow manager:

SD-KG Construction Manager: Orchestrates parallel extraction over a stack-based scheduler $\mathcal{S}_c$ , with micro-batch processing and schema validation (Anomaly Guard). Deduplication is managed via LLM-based canonicalization of nodes.
Trajectory Imputation Manager: Gap segments are queued on $\mathcal{S}_i$ ; responses are validated for non-emptiness and executability, with retry-and-quarantine protocols in place.

Synchronous batch barriers coordinate throughput and maintain logical segment ordering.

6. Experimental Evaluation and Benchmarking

6.1 Datasets and Metrics

VISTA was benchmarked on two real-world AIS datasets:

AIS-DK (March 2024): 10,000 vessel sequences, 2,000,000 records, 0.5h average per sequence, 348 vessels, Danish waters.
AIS-US (April 2024): 10,000 sequences, identical record count, 2.8h average per sequence, 4,723 vessels, US coastal waters.

Evaluation metrics include axis-wise MAE and RMSE ( $\mathrm{MAE}_\phi, \mathrm{RMSE}_\phi$ , $\mathrm{MAE}_\lambda, \mathrm{RMSE}_\lambda$ ) and mean Haversine distance (MHD) in kilometers.

6.2 Comparative Results

VISTA’s accuracy and efficiency were compared against rule-based (Linear Interpolation, Akima Spline, Kalman Filter), deep-learning (Multi-task AIS, MH-GIN), and LLM-based (KAMEL, Qwen-plus-th, etc.) baselines.

Table: MHD and Efficiency Comparison (Top-line Results)

Method	AIS-DK MHD (km)	Improvement vs. best baseline	AIS-US MHD (km)	Improvement
MH-GIN	0.2836	—	2.2164	—
VISTA	0.2418	+14.8%	0.7945	+64.2%

Method	AIS-DK Time	AIS-US Time
VISTA	6:32:37	6:01:11
Qwen-plus-th	30:15:11	24:09:11
GLM-4.5-th	91:07:30	88:52:08

VISTA attained 5–94% improvement in accuracy and a 51–93% reduction in computation time over the strongest competitors.

7. Interpretability, Downstream Usage, and Qualitative Analysis

VISTA provides domain-aligned, explicable imputation via rationales at behavior ( $\mathcal{J}_{\iota}^{k,b}$ ), method ( $\mathcal{J}_{\iota}^{k,f}$ ), and holistic human-readable explanation ( $\mathcal{J}_{\iota}^{k,h}$ ) levels, each citing empirical edge-weight statistics and regulatory context.

7.1 Anomaly Detection

A case study illustrates interpretive output: For a tanker missing 50 seconds of AIS data near Delaware Bay, VISTA correctly infers a stable-turn, lane-following maneuver, referencing the Delaware River TSS and inbound-lane regulations. The reconstructed arc-shaped path is justified based on empirical frequency ("68% of cargo vessels perform this maneuver under TSS inbound-lane conditions") and regulatory protocol, flagging anomalies only for nonconforming behavior.

7.2 Route Planning Support

Knowledge cues—such as "port-entry procedures" or "queue-following"—seed automated prioritization and simulation. Edges in the SD-KG highlight context-conditional navigational patterns, facilitating adaptive models of maritime traffic.

7.3 Qualitative Interpretations

Explanations bridge data science and operational practice, leveraging SDK-derived statistics and LLM-generated rationales to elucidate the decision path for each imputation event in a practitioner-oriented format.

VISTA thus combines structured data graph mining, LLM-driven knowledge retrieval, and rigorous workflow engineering to deliver trajectory imputation with explicit, actionable interpretability and strong support for downstream maritime analytics (Liu et al., 11 Jan 2026).

Markdown Report Issue Upgrade to Chat

References (1)

VISTA: Knowledge-Driven Interpretable Vessel Trajectory Imputation via Large Language Models (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Knowledge-Driven Interpretable Vessel Trajectory Imputation (VISTA).