- The paper introduces an unsupervised CAE framework that transforms vessel trajectories into low-dimensional features, enabling rapid and efficient similarity computation.
- It leverages image projection with spline interpolation and a hybrid L1+SSIM loss to robustly handle noisy, irregular AIS data.
- Experimental results demonstrate up to two orders of magnitude speed improvements over DTW and Fréchet with significantly improved clustering quality.
Unsupervised CAE-Based Vessel Trajectory Similarity Computation
The paper "An Unsupervised Learning Method with Convolutional Auto-Encoder for Vessel Trajectory Similarity Computation" (2101.03169) presents an unsupervised methodology leveraging Convolutional Auto-Encoders (CAE) for calculating similarities between large-scale vessel trajectories, with a direct focus on computational efficiency and robustness to non-uniform sampling and noise.
Motivation and Background
Vessel trajectory similarity computation is foundational for numerous maritime applications including behavior modeling, anomaly detection, clustering, route planning, and maritime surveillance. Conventional distance metrics—such as dynamic time warping (DTW), Fréchet distance, edit distance, and other shape- or warping-based techniques—are widely used but suffer from high computational complexity and sensitivity to artifacts and non-uniform temporal sampling, which are intrinsic to automatic identification system (AIS) data. Deep learning methods, while successful in related temporal domains, had not been adequately explored for large-scale trajectory similarity computation of vessel data prior to this work.
Methodological Contributions
A key methodological innovation is the end-to-end pipeline that transforms irregular, noisy vessel trajectories into a low-dimensional, information-preserving feature space, enabling fast and robust similarity computations.
1. Trajectory Image Generation
The approach begins by resampling the raw trajectories (with spline interpolation to handle gaps and noise) at a fixed time interval, and projecting these into a spatial grid defined by Mercator projection. Each trajectory is thereby mapped to a 2D binary image, with each grid cell encoding trajectory presence. This canonicalization normalizes spatial and temporal irregularities and mitigates non-uniform sampling.
2. CAE Network Design
The CAE architecture comprises:
- Encoder: Four convolutional layers (filter sizes: 9×9, 7×7, 5×5, 3×3; filters per layer: 16, 16, 8, 8), each followed by 2×2 max pooling, culminating in a fully connected layer to produce a feature vector of dimension L (empirically, %%%%1%%%% suffices for accurate similarity).
- Decoder: Mirrored to the encoder (with unpooling and deconvolution) to reconstruct the input trajectory image.
Notably, the use of convolutional layers allows effective spatial feature learning; the unsupervised auto-encoding objective produces global, label-free representations.
3. Loss Function
To avoid the blurring defects associated with an L2 (MSE) objective and to increase robustness to outliers, a hybrid loss function is adopted: a weighted sum of L1 reconstruction loss and perceptually-aware Structural Similarity Index Measure (SSIM) loss (λ1=0.15, λ2=0.85). This increases fidelity especially in cases of fine trajectory structures or small-scale deviations.
4. Similarity Computation
Once the CAE is trained, only the encoder is needed to produce fixed-dimension feature vectors for new trajectory images. Similarity between two trajectories reduces to a Euclidean distance computation between their low-dimensional features, facilitating efficient, large-scale search and clustering.
5. Trajectory Clustering as Validation
To indirectly evaluate similarity effectiveness, the method applies hierarchical clustering to the CAE features. The Between-Like (BC) and Within-Like (WC) criteria, and their ratio (AC), provide quantitative clustering accuracy proxies, as there are no accepted trajectory similarity ground-truth benchmarks.
Experimental Evaluation
Experiments span three real-world datasets (Caofeidian Port, Chengshan Cape, Yangtze River Estuary; each with >1,000 vessel trajectories). Main findings:
- Computational Efficiency: The CAE-based approach is up to two orders of magnitude faster than DTW or Fréchet for N=500 trajectories: average times ∼3.7 seconds (CAE) versus ∼670 (DTW) and ∼535 (Fréchet) seconds.
- Parameter Selection: Extensive sensitivity analysis showed that L=3 suffices for the feature vector; optimal grid size for image representation is 66×50; learning rate 0.001 ensures convergence.
- Clustering Performance: CAE-based similarity leads to markedly lower AC scores (indicative of better cluster separability), and visual inspection demonstrates more semantically coherent trajectory clusters compared to those generated with DTW or Fréchet similarity.
- Robustness: The CAE maintains high-quality performance under diverse, noisy, and non-uniformly sampled datasets.
Implementation and Practical Guidance
Data Pipeline
- Preprocessing:
- Resample AIS points via cubic spline interpolation, with fixed interval (e.g., 5 seconds).
- Spatially discretize area with grid size matched to area extents and desired image resolution.
- Create binary trajectory images via grid projection; threshold cell occupation (empirically ϵ=3).
- Model Training:
- Use PyTorch or equivalent deep learning frameworks (original experiments: PyTorch 1.0, 2080Ti GPU).
- Batch size: 200, epochs: 3000 (empirical).
- Regularization via L1+SSIM loss.
- Feature Extraction:
- Post-training, retain encoder only; map any trajectory image to a low-dimensional feature vector.
- Similarity Measurement and Clustering:
- Use Euclidean metric on feature vectors for pairwise similarity.
- Apply standard clustering methods (e.g., hierarchical clustering) to compute and validate groupings.
Code Skeleton (PyTorch):
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
|
import torch
import torch.nn as nn
import torch.nn.functional as F
class CAE(nn.Module):
def __init__(self, input_shape=(1, 66, 50), feature_dim=3):
super(CAE, self).__init__()
# Encoder
self.enc_conv1 = nn.Conv2d(1, 16, kernel_size=9, padding=4)
self.enc_conv2 = nn.Conv2d(16, 16, kernel_size=7, padding=3)
self.enc_conv3 = nn.Conv2d(16, 8, kernel_size=5, padding=2)
self.enc_conv4 = nn.Conv2d(8, 8, kernel_size=3, padding=1)
self.pool = nn.MaxPool2d(2)
# Fully connected to latent vector
dummy_input = torch.zeros((1,) + input_shape)
x = self.forward_enc(dummy_input)
self.flat_dim = x.view(1, -1).shape[1]
self.fc = nn.Linear(self.flat_dim, feature_dim)
# Decoder
self.fc_inv = nn.Linear(feature_dim, self.flat_dim)
self.unpool = nn.Upsample(scale_factor=2, mode='nearest')
self.dec_conv4 = nn.ConvTranspose2d(8, 8, 3, padding=1)
self.dec_conv3 = nn.ConvTranspose2d(8, 16, 5, padding=2)
self.dec_conv2 = nn.ConvTranspose2d(16, 16, 7, padding=3)
self.dec_conv1 = nn.ConvTranspose2d(16, 1, 9, padding=4)
def forward_enc(self, x):
x = F.relu(self.pool(self.enc_conv1(x)))
x = F.relu(self.pool(self.enc_conv2(x)))
x = F.relu(self.pool(self.enc_conv3(x)))
x = F.relu(self.pool(self.enc_conv4(x)))
return x
def encode(self, x):
x = self.forward_enc(x)
x = x.view(x.size(0), -1)
return self.fc(x)
def decode(self, z):
x = self.fc_inv(z).view(-1, 8, 4, 3) # Adjust according to input
x = F.relu(self.unpool(self.dec_conv4(x)))
x = F.relu(self.unpool(self.dec_conv3(x)))
x = F.relu(self.unpool(self.dec_conv2(x)))
x = torch.sigmoid(self.unpool(self.dec_conv1(x)))
return x
def forward(self, x):
z = self.encode(x)
return self.decode(z) |
Loss calculation with L1 and SSIM (via pytorch-ssim package):
1
2
3
4
5
6
|
import pytorch_ssim
def hybrid_loss(x, x_hat, lambda1=0.15, lambda2=0.85):
l1 = F.l1_loss(x_hat, x)
ssim = 1 - pytorch_ssim.ssim(x_hat, x)
return lambda1 * l1 + lambda2 * ssim |
Theoretical and Practical Implications
- Efficiency vs. Expressivity: By projecting trajectories into image space and then into a low-dimensional latent space, the method sidesteps the alignment and pairwise computation explosion in classical methods, vastly improving scalability.
- Robustness: The pipeline is agnostic to variable trajectory lengths, sampling rates, and moderate noise, eliminating the need for extensive data cleaning or interpolation.
- Extensibility: The segmentation into images enables further use of 2D CNN advances (e.g., attention, residual connections), and is compatible with transfer learning from other spatio-temporal domains.
- Limitations: The pipeline implicitly assumes that spatial proximity and path shape are the dominant factors in trajectory similarity; contextual or semantic trajectory attributes (e.g., vessel speed, heading, or intent) are not directly encoded but could be incorporated in future versions.
- Benchmarking Need: The lack of ground-truth-annotated benchmarks for maritime trajectory similarity precludes direct supervised evaluation; development of such resources would enable rigorous, standardized algorithmic comparison.
Outlook
This architecture could be generalized to other spatial-temporal trajectory types (e.g., vehicle, pedestrian, animal migration), or extended with multi-modal vessel characteristics (course, speed, metadata) via channel augmentation. The framework's robust, scalable, and unsupervised nature makes it highly suitable for integration as a trajectory embedding module in broader maritime intelligence, monitoring, or anomaly detection platforms.
| Metric |
CAE (Ours) |
DTW |
Fréchet |
| Pairwise Similarity Time (N=500) |
~3.7 s |
~673 s |
~535 s |
| Feature Vector Dimensionality |
3 |
N/A |
N/A |
| Robust to Length/Sampling |
Yes |
Limited |
Limited |
| Clustering Quality (AC score) |
Lower (better) |
Higher |
Higher |
| Parameter Sensitivity |
Moderate |
High |
High |
| Label-Free |
Yes |
Yes |
Yes |
Conclusion
This paper demonstrates the effectiveness of a CAE-based unsupervised learning approach for vessel trajectory similarity computation. The paradigm achieves high accuracy and dramatic computational acceleration over classical methods, particularly on realistic, noisy, and large-scale AIS trajectory data. Its design also sets a precedent for further research in trajectory representation learning and similarity search in spatio-temporal domains, with clear extensibility toward both richer trajectory semantization and cross-domain adaptation.