Papers
Topics
Authors
Recent
Search
2000 character limit reached

Interpretable Vessel Trajectory Imputation (VISTA)

Updated 18 January 2026
  • The paper presents VISTA, a novel framework that uses structured data and LLM-generated insights to impute missing vessel trajectory segments with clear interpretability.
  • VISTA builds a structured knowledge graph linking vessel static attributes and behavior patterns to enable precise anomaly detection and effective route planning.
  • Experimental results on AIS datasets show VISTA achieves up to 94% accuracy improvements and significant computation time reduction over baseline methods.

Knowledge-driven interpretable vessel trajectory imputation (VISTA) is a framework designed to recover missing segments in vessel Automatic Identification System (AIS) data with a particular emphasis on interpretability and knowledge transfer. VISTA aims to address the deficiencies of conventional black-box or statistical imputation approaches by generating structured, human-interpretable cues that explicitly justify each reconstructed trajectory segment and facilitate downstream analysis, including anomaly detection and route planning (Liu et al., 11 Jan 2026).

1. Underlying Knowledge Foundations

VISTA operationalizes "underlying knowledge" as the core resource for trajectory recovery, explicitly defined as the fusion of two key components:

  • Structured Data-derived Knowledge (SDK, Kd\mathcal{K}_d): This is distilled from historical AIS records and encodes empirical vessel properties, behavior patterns, imputation-method choices, and their co-occurrences.
  • Implicit LLM Knowledge (K\mathcal{K}_\ell): Commonsense maritime priors, navigation rules, and explanatory information acquired by LLMs pre-trained on extensive Internet corpora.

The unified knowledge construct is formalized as

U=Φ(Kd,K)\mathcal{U} = \Phi(\mathcal{K}_d, \mathcal{K}_\ell)

where Φ\Phi denotes the integration mechanism. SDK triggers relevant LLM queries, retrieving textual explanations that ground imputation decisions.

2. Structured Data-derived Knowledge Representation

2.1 Knowledge Extraction Pipeline

Each vessel trajectory Xι\mathbf{X}_\iota is partitioned into fixed-size minimal segments Sιk\mathbf{S}_\iota^k of length mm with a binary mask Mι{0,1}K\mathbf{M}_\iota\in\{0,1\}^K distinguishing complete segments. For every complete segment (Mιk=1M_\iota^k=1), a tripartite knowledge unit (vs,vb,vf)(v_s,v_b,v_f) is constructed:

  • Static Attribute (vsv_s): Vessel identifier ι\iota, status η\eta, cargo type χ\chi, draught dd, length \ell, width β\beta, spatial context σ\sigma, and type κ\kappa.
  • Behavior Pattern (vbv_b): Tuple (ps,pθ,pψ,pi,pτ)(p^s,p^\theta,p^\psi,p^i,p^\tau) encoding speed, course, heading, LLM-inferred intent, and duration.
  • Imputation Function (vfv_f): An executable Python function ff and LLM-generated description d(f)d(f).

2.2 Knowledge Graph Construction

SDK instances are organized in a Structured Data-derived Knowledge Graph (SD-KG), Gd=(Vd,Ed)\mathcal{G}_d=(\mathcal{V}_d, \mathcal{E}_d) with three node types: Vd=VsVbVf\mathcal{V}_d = \mathcal{V}_s \cup \mathcal{V}_b \cup \mathcal{V}_f Edges are defined as EsbVs×Vb\mathcal{E}_{sb}\subset \mathcal{V}_s\times\mathcal{V}_b and EbfVb×Vf\mathcal{E}_{bf}\subset \mathcal{V}_b\times\mathcal{V}_f; edge weights wsbw_{sb} and wbfw_{bf} correspond to empirical co-occurrence counts in the training data. The adjacency matrices AsbNVs×VbA_{sb}\in\mathbb{N}^{|\mathcal{V}_s|\times|\mathcal{V}_b|} and AbfNVb×VfA_{bf}\in\mathbb{N}^{|\mathcal{V}_b|\times|\mathcal{V}_f|} encode these relationships.

3. Integration and Utilization of Implicit LLM Knowledge

A pretrained LLM (e.g., Qwen-plus, GLM-4.5-th) augments SDK by synthesizing domain priors—regulatory frameworks, operational heuristics, and narrative explanations. LLM knowledge remains implicit in the model weights and is retrieved via templated prompts using descriptors from SD-KG, producing a set of rationales and operational cues K\mathcal{K}_\ell aligned to SDK semantics. This yields a composite evidence base (U\mathcal{U}) for each imputation event, promoting human-understandable interpretability and regulatory traceability.

4. Data–Knowledge–Data Loop and Imputation Algorithms

4.1 SD-KG Construction

The initial phase partitions vessel data into segments and processes them serially, extracting static and behavioral context via LLM abstractions, generating imputation functions, and assembling the SD-KG with weighted edges reflecting data frequencies.

4.2 Knowledge-Driven Trajectory Imputation

For each gap segment Sιk\mathbf{S}_\iota^k, the following steps are executed:

  1. Context Extraction: Retrieve segment-boundary context (uι,uι+u_\iota^{-},u_\iota^{+}), static attributes Vsk(ι)\mathcal{V}_s^k(\iota), and relevant behavioral patterns.
  2. Behavior Estimation: Candidate behaviors Cb\mathcal{C}_b are extracted according to wsbw_{sb} statistics; each is scored via empirical priors:

π(vb)=vVsk(ι)(wsb(v,vb)+1)vbCbv(wsb(v,vb)+1)\pi(v_b) = \frac{\prod_{v\in \mathcal{V}_s^k(\iota)} (w_{sb}(v,v_b)+1)}{\sum_{v_b'\in\mathcal{C}_b} \prod_v (w_{sb}(v,v_b')+1)}

Top-KbK_b are shortlisted; an LLM selects the final behavior vbv_b^* and generates explanation Jb\mathcal{J}_b.

  1. Method Selection: Candidate imputation methods from AbfA_{bf} are similarly scored and narrowed to vfv_f^*; rationale Jf\mathcal{J}_f is provided.
  2. Execution: The chosen method ff^* is applied deterministically.
  3. Explanation Composer: Human-readable explanation Jh\mathcal{J}_h integrates regulatory, evidentiary, and operational context.

No gradient-based optimization is performed; selection hinges on maximizing data-grounded priors and kinematic consistency.

5. Workflow Management and Scalability

To efficiently operate at scale, VISTA incorporates a multi-layer workflow manager:

  • SD-KG Construction Manager: Orchestrates parallel extraction over a stack-based scheduler Sc\mathcal{S}_c, with micro-batch processing and schema validation (Anomaly Guard). Deduplication is managed via LLM-based canonicalization of nodes.
  • Trajectory Imputation Manager: Gap segments are queued on Si\mathcal{S}_i; responses are validated for non-emptiness and executability, with retry-and-quarantine protocols in place.

Synchronous batch barriers coordinate throughput and maintain logical segment ordering.

6. Experimental Evaluation and Benchmarking

6.1 Datasets and Metrics

VISTA was benchmarked on two real-world AIS datasets:

  • AIS-DK (March 2024): 10,000 vessel sequences, 2,000,000 records, 0.5h average per sequence, 348 vessels, Danish waters.
  • AIS-US (April 2024): 10,000 sequences, identical record count, 2.8h average per sequence, 4,723 vessels, US coastal waters.

Evaluation metrics include axis-wise MAE and RMSE (MAEϕ,RMSEϕ\mathrm{MAE}_\phi, \mathrm{RMSE}_\phi, MAEλ,RMSEλ\mathrm{MAE}_\lambda, \mathrm{RMSE}_\lambda) and mean Haversine distance (MHD) in kilometers.

6.2 Comparative Results

VISTA’s accuracy and efficiency were compared against rule-based (Linear Interpolation, Akima Spline, Kalman Filter), deep-learning (Multi-task AIS, MH-GIN), and LLM-based (KAMEL, Qwen-plus-th, etc.) baselines.

Table: MHD and Efficiency Comparison (Top-line Results)

Method AIS-DK MHD (km) Improvement vs. best baseline AIS-US MHD (km) Improvement
MH-GIN 0.2836 2.2164
VISTA 0.2418 +14.8% 0.7945 +64.2%
Method AIS-DK Time AIS-US Time
VISTA 6:32:37 6:01:11
Qwen-plus-th 30:15:11 24:09:11
GLM-4.5-th 91:07:30 88:52:08

VISTA attained 5–94% improvement in accuracy and a 51–93% reduction in computation time over the strongest competitors.

7. Interpretability, Downstream Usage, and Qualitative Analysis

VISTA provides domain-aligned, explicable imputation via rationales at behavior (Jιk,b\mathcal{J}_{\iota}^{k,b}), method (Jιk,f\mathcal{J}_{\iota}^{k,f}), and holistic human-readable explanation (Jιk,h\mathcal{J}_{\iota}^{k,h}) levels, each citing empirical edge-weight statistics and regulatory context.

7.1 Anomaly Detection

A case study illustrates interpretive output: For a tanker missing 50 seconds of AIS data near Delaware Bay, VISTA correctly infers a stable-turn, lane-following maneuver, referencing the Delaware River TSS and inbound-lane regulations. The reconstructed arc-shaped path is justified based on empirical frequency ("68% of cargo vessels perform this maneuver under TSS inbound-lane conditions") and regulatory protocol, flagging anomalies only for nonconforming behavior.

7.2 Route Planning Support

Knowledge cues—such as "port-entry procedures" or "queue-following"—seed automated prioritization and simulation. Edges in the SD-KG highlight context-conditional navigational patterns, facilitating adaptive models of maritime traffic.

7.3 Qualitative Interpretations

Explanations bridge data science and operational practice, leveraging SDK-derived statistics and LLM-generated rationales to elucidate the decision path for each imputation event in a practitioner-oriented format.


VISTA thus combines structured data graph mining, LLM-driven knowledge retrieval, and rigorous workflow engineering to deliver trajectory imputation with explicit, actionable interpretability and strong support for downstream maritime analytics (Liu et al., 11 Jan 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Knowledge-Driven Interpretable Vessel Trajectory Imputation (VISTA).