Interpretable Vessel Trajectory Imputation (VISTA)
- The paper presents VISTA, a novel framework that uses structured data and LLM-generated insights to impute missing vessel trajectory segments with clear interpretability.
- VISTA builds a structured knowledge graph linking vessel static attributes and behavior patterns to enable precise anomaly detection and effective route planning.
- Experimental results on AIS datasets show VISTA achieves up to 94% accuracy improvements and significant computation time reduction over baseline methods.
Knowledge-driven interpretable vessel trajectory imputation (VISTA) is a framework designed to recover missing segments in vessel Automatic Identification System (AIS) data with a particular emphasis on interpretability and knowledge transfer. VISTA aims to address the deficiencies of conventional black-box or statistical imputation approaches by generating structured, human-interpretable cues that explicitly justify each reconstructed trajectory segment and facilitate downstream analysis, including anomaly detection and route planning (Liu et al., 11 Jan 2026).
1. Underlying Knowledge Foundations
VISTA operationalizes "underlying knowledge" as the core resource for trajectory recovery, explicitly defined as the fusion of two key components:
- Structured Data-derived Knowledge (SDK, ): This is distilled from historical AIS records and encodes empirical vessel properties, behavior patterns, imputation-method choices, and their co-occurrences.
- Implicit LLM Knowledge (): Commonsense maritime priors, navigation rules, and explanatory information acquired by LLMs pre-trained on extensive Internet corpora.
The unified knowledge construct is formalized as
where denotes the integration mechanism. SDK triggers relevant LLM queries, retrieving textual explanations that ground imputation decisions.
2. Structured Data-derived Knowledge Representation
2.1 Knowledge Extraction Pipeline
Each vessel trajectory is partitioned into fixed-size minimal segments of length with a binary mask distinguishing complete segments. For every complete segment (), a tripartite knowledge unit is constructed:
- Static Attribute (): Vessel identifier , status , cargo type , draught , length , width , spatial context , and type .
- Behavior Pattern (): Tuple encoding speed, course, heading, LLM-inferred intent, and duration.
- Imputation Function (): An executable Python function and LLM-generated description .
2.2 Knowledge Graph Construction
SDK instances are organized in a Structured Data-derived Knowledge Graph (SD-KG), with three node types: Edges are defined as and ; edge weights and correspond to empirical co-occurrence counts in the training data. The adjacency matrices and encode these relationships.
3. Integration and Utilization of Implicit LLM Knowledge
A pretrained LLM (e.g., Qwen-plus, GLM-4.5-th) augments SDK by synthesizing domain priors—regulatory frameworks, operational heuristics, and narrative explanations. LLM knowledge remains implicit in the model weights and is retrieved via templated prompts using descriptors from SD-KG, producing a set of rationales and operational cues aligned to SDK semantics. This yields a composite evidence base () for each imputation event, promoting human-understandable interpretability and regulatory traceability.
4. Data–Knowledge–Data Loop and Imputation Algorithms
4.1 SD-KG Construction
The initial phase partitions vessel data into segments and processes them serially, extracting static and behavioral context via LLM abstractions, generating imputation functions, and assembling the SD-KG with weighted edges reflecting data frequencies.
4.2 Knowledge-Driven Trajectory Imputation
For each gap segment , the following steps are executed:
- Context Extraction: Retrieve segment-boundary context (), static attributes , and relevant behavioral patterns.
- Behavior Estimation: Candidate behaviors are extracted according to statistics; each is scored via empirical priors:
Top- are shortlisted; an LLM selects the final behavior and generates explanation .
- Method Selection: Candidate imputation methods from are similarly scored and narrowed to ; rationale is provided.
- Execution: The chosen method is applied deterministically.
- Explanation Composer: Human-readable explanation integrates regulatory, evidentiary, and operational context.
No gradient-based optimization is performed; selection hinges on maximizing data-grounded priors and kinematic consistency.
5. Workflow Management and Scalability
To efficiently operate at scale, VISTA incorporates a multi-layer workflow manager:
- SD-KG Construction Manager: Orchestrates parallel extraction over a stack-based scheduler , with micro-batch processing and schema validation (Anomaly Guard). Deduplication is managed via LLM-based canonicalization of nodes.
- Trajectory Imputation Manager: Gap segments are queued on ; responses are validated for non-emptiness and executability, with retry-and-quarantine protocols in place.
Synchronous batch barriers coordinate throughput and maintain logical segment ordering.
6. Experimental Evaluation and Benchmarking
6.1 Datasets and Metrics
VISTA was benchmarked on two real-world AIS datasets:
- AIS-DK (March 2024): 10,000 vessel sequences, 2,000,000 records, 0.5h average per sequence, 348 vessels, Danish waters.
- AIS-US (April 2024): 10,000 sequences, identical record count, 2.8h average per sequence, 4,723 vessels, US coastal waters.
Evaluation metrics include axis-wise MAE and RMSE (, ) and mean Haversine distance (MHD) in kilometers.
6.2 Comparative Results
VISTA’s accuracy and efficiency were compared against rule-based (Linear Interpolation, Akima Spline, Kalman Filter), deep-learning (Multi-task AIS, MH-GIN), and LLM-based (KAMEL, Qwen-plus-th, etc.) baselines.
Table: MHD and Efficiency Comparison (Top-line Results)
| Method | AIS-DK MHD (km) | Improvement vs. best baseline | AIS-US MHD (km) | Improvement |
|---|---|---|---|---|
| MH-GIN | 0.2836 | — | 2.2164 | — |
| VISTA | 0.2418 | +14.8% | 0.7945 | +64.2% |
| Method | AIS-DK Time | AIS-US Time |
|---|---|---|
| VISTA | 6:32:37 | 6:01:11 |
| Qwen-plus-th | 30:15:11 | 24:09:11 |
| GLM-4.5-th | 91:07:30 | 88:52:08 |
VISTA attained 5–94% improvement in accuracy and a 51–93% reduction in computation time over the strongest competitors.
7. Interpretability, Downstream Usage, and Qualitative Analysis
VISTA provides domain-aligned, explicable imputation via rationales at behavior (), method (), and holistic human-readable explanation () levels, each citing empirical edge-weight statistics and regulatory context.
7.1 Anomaly Detection
A case study illustrates interpretive output: For a tanker missing 50 seconds of AIS data near Delaware Bay, VISTA correctly infers a stable-turn, lane-following maneuver, referencing the Delaware River TSS and inbound-lane regulations. The reconstructed arc-shaped path is justified based on empirical frequency ("68% of cargo vessels perform this maneuver under TSS inbound-lane conditions") and regulatory protocol, flagging anomalies only for nonconforming behavior.
7.2 Route Planning Support
Knowledge cues—such as "port-entry procedures" or "queue-following"—seed automated prioritization and simulation. Edges in the SD-KG highlight context-conditional navigational patterns, facilitating adaptive models of maritime traffic.
7.3 Qualitative Interpretations
Explanations bridge data science and operational practice, leveraging SDK-derived statistics and LLM-generated rationales to elucidate the decision path for each imputation event in a practitioner-oriented format.
VISTA thus combines structured data graph mining, LLM-driven knowledge retrieval, and rigorous workflow engineering to deliver trajectory imputation with explicit, actionable interpretability and strong support for downstream maritime analytics (Liu et al., 11 Jan 2026).