An Unsupervised Learning Method with Convolutional Auto-Encoder for Vessel Trajectory Similarity Computation

Published 10 Jan 2021 in cs.LG, cs.AI, and cs.CV | (2101.03169v1)

Abstract: To achieve reliable mining results for massive vessel trajectories, one of the most important challenges is how to efficiently compute the similarities between different vessel trajectories. The computation of vessel trajectory similarity has recently attracted increasing attention in the maritime data mining research community. However, traditional shape- and warping-based methods often suffer from several drawbacks such as high computational cost and sensitivity to unwanted artifacts and non-uniform sampling rates, etc. To eliminate these drawbacks, we propose an unsupervised learning method which automatically extracts low-dimensional features through a convolutional auto-encoder (CAE). In particular, we first generate the informative trajectory images by remapping the raw vessel trajectories into two-dimensional matrices while maintaining the spatio-temporal properties. Based on the massive vessel trajectories collected, the CAE can learn the low-dimensional representations of informative trajectory images in an unsupervised manner. The trajectory similarity is finally equivalent to efficiently computing the similarities between the learned low-dimensional features, which strongly correlate with the raw vessel trajectories. Comprehensive experiments on realistic data sets have demonstrated that the proposed method largely outperforms traditional trajectory similarity computation methods in terms of efficiency and effectiveness. The high-quality trajectory clustering performance could also be guaranteed according to the CAE-based trajectory similarity computation results.

Abstract PDF Upgrade to Chat

Citations (80)

View on Semantic Scholar

Summary

The paper introduces an unsupervised CAE framework that transforms vessel trajectories into low-dimensional features, enabling rapid and efficient similarity computation.
It leverages image projection with spline interpolation and a hybrid L1+SSIM loss to robustly handle noisy, irregular AIS data.
Experimental results demonstrate up to two orders of magnitude speed improvements over DTW and Fréchet with significantly improved clustering quality.

Unsupervised CAE-Based Vessel Trajectory Similarity Computation

The paper "An Unsupervised Learning Method with Convolutional Auto-Encoder for Vessel Trajectory Similarity Computation" (2101.03169) presents an unsupervised methodology leveraging Convolutional Auto-Encoders (CAE) for calculating similarities between large-scale vessel trajectories, with a direct focus on computational efficiency and robustness to non-uniform sampling and noise.

Motivation and Background

Vessel trajectory similarity computation is foundational for numerous maritime applications including behavior modeling, anomaly detection, clustering, route planning, and maritime surveillance. Conventional distance metrics—such as dynamic time warping (DTW), Fréchet distance, edit distance, and other shape- or warping-based techniques—are widely used but suffer from high computational complexity and sensitivity to artifacts and non-uniform temporal sampling, which are intrinsic to automatic identification system (AIS) data. Deep learning methods, while successful in related temporal domains, had not been adequately explored for large-scale trajectory similarity computation of vessel data prior to this work.

Methodological Contributions

A key methodological innovation is the end-to-end pipeline that transforms irregular, noisy vessel trajectories into a low-dimensional, information-preserving feature space, enabling fast and robust similarity computations.

1. Trajectory Image Generation

The approach begins by resampling the raw trajectories (with spline interpolation to handle gaps and noise) at a fixed time interval, and projecting these into a spatial grid defined by Mercator projection. Each trajectory is thereby mapped to a 2D binary image, with each grid cell encoding trajectory presence. This canonicalization normalizes spatial and temporal irregularities and mitigates non-uniform sampling.

2. CAE Network Design

The CAE architecture comprises:

Encoder: Four convolutional layers (filter sizes: 9×9, 7×7, 5×5, 3×3; filters per layer: 16, 16, 8, 8), each followed by 2×2 max pooling, culminating in a fully connected layer to produce a feature vector of dimension $L$ (empirically, %%%%1%%%% suffices for accurate similarity).
Decoder: Mirrored to the encoder (with unpooling and deconvolution) to reconstruct the input trajectory image.

Notably, the use of convolutional layers allows effective spatial feature learning; the unsupervised auto-encoding objective produces global, label-free representations.

3. Loss Function

To avoid the blurring defects associated with an $L_2$ (MSE) objective and to increase robustness to outliers, a hybrid loss function is adopted: a weighted sum of $L_1$ reconstruction loss and perceptually-aware Structural Similarity Index Measure (SSIM) loss ( $\lambda_1=0.15$ , $\lambda_2=0.85$ ). This increases fidelity especially in cases of fine trajectory structures or small-scale deviations.

4. Similarity Computation

Once the CAE is trained, only the encoder is needed to produce fixed-dimension feature vectors for new trajectory images. Similarity between two trajectories reduces to a Euclidean distance computation between their low-dimensional features, facilitating efficient, large-scale search and clustering.

5. Trajectory Clustering as Validation

To indirectly evaluate similarity effectiveness, the method applies hierarchical clustering to the CAE features. The Between-Like (BC) and Within-Like (WC) criteria, and their ratio (AC), provide quantitative clustering accuracy proxies, as there are no accepted trajectory similarity ground-truth benchmarks.

Experimental Evaluation

Experiments span three real-world datasets (Caofeidian Port, Chengshan Cape, Yangtze River Estuary; each with >1,000 vessel trajectories). Main findings:

Computational Efficiency: The CAE-based approach is up to two orders of magnitude faster than DTW or Fréchet for $N=500$ trajectories: average times $\sim 3.7$ seconds (CAE) versus $\sim 670$ (DTW) and $\sim 535$ (Fréchet) seconds.
Parameter Selection: Extensive sensitivity analysis showed that $L=3$ suffices for the feature vector; optimal grid size for image representation is $66\times50$ ; learning rate 0.001 ensures convergence.
Clustering Performance: CAE-based similarity leads to markedly lower AC scores (indicative of better cluster separability), and visual inspection demonstrates more semantically coherent trajectory clusters compared to those generated with DTW or Fréchet similarity.
Robustness: The CAE maintains high-quality performance under diverse, noisy, and non-uniformly sampled datasets.

Implementation and Practical Guidance

Data Pipeline

Preprocessing:
- Resample AIS points via cubic spline interpolation, with fixed interval (e.g., 5 seconds).
- Spatially discretize area with grid size matched to area extents and desired image resolution.
- Create binary trajectory images via grid projection; threshold cell occupation (empirically $\epsilon=3$ ).
Model Training:
- Use PyTorch or equivalent deep learning frameworks (original experiments: PyTorch 1.0, 2080Ti GPU).
- Batch size: 200, epochs: 3000 (empirical).
- Regularization via $L_1$ +SSIM loss.
Feature Extraction:
- Post-training, retain encoder only; map any trajectory image to a low-dimensional feature vector.
Similarity Measurement and Clustering:
- Use Euclidean metric on feature vectors for pairwise similarity.
- Apply standard clustering methods (e.g., hierarchical clustering) to compute and validate groupings.

Code Skeleton (PyTorch):

import torch
import torch.nn as nn
import torch.nn.functional as F

class CAE(nn.Module):
    def __init__(self, input_shape=(1, 66, 50), feature_dim=3):
        super(CAE, self).__init__()
        # Encoder
        self.enc_conv1 = nn.Conv2d(1, 16, kernel_size=9, padding=4)
        self.enc_conv2 = nn.Conv2d(16, 16, kernel_size=7, padding=3)
        self.enc_conv3 = nn.Conv2d(16, 8, kernel_size=5, padding=2)
        self.enc_conv4 = nn.Conv2d(8, 8, kernel_size=3, padding=1)
        self.pool = nn.MaxPool2d(2)
        # Fully connected to latent vector
        dummy_input = torch.zeros((1,) + input_shape)
        x = self.forward_enc(dummy_input)
        self.flat_dim = x.view(1, -1).shape[1]
        self.fc = nn.Linear(self.flat_dim, feature_dim)
        # Decoder
        self.fc_inv = nn.Linear(feature_dim, self.flat_dim)
        self.unpool = nn.Upsample(scale_factor=2, mode='nearest')
        self.dec_conv4 = nn.ConvTranspose2d(8, 8, 3, padding=1)
        self.dec_conv3 = nn.ConvTranspose2d(8, 16, 5, padding=2)
        self.dec_conv2 = nn.ConvTranspose2d(16, 16, 7, padding=3)
        self.dec_conv1 = nn.ConvTranspose2d(16, 1, 9, padding=4)

    def forward_enc(self, x):
        x = F.relu(self.pool(self.enc_conv1(x)))
        x = F.relu(self.pool(self.enc_conv2(x)))
        x = F.relu(self.pool(self.enc_conv3(x)))
        x = F.relu(self.pool(self.enc_conv4(x)))
        return x

    def encode(self, x):
        x = self.forward_enc(x)
        x = x.view(x.size(0), -1)
        return self.fc(x)

    def decode(self, z):
        x = self.fc_inv(z).view(-1, 8, 4, 3)  # Adjust according to input
        x = F.relu(self.unpool(self.dec_conv4(x)))
        x = F.relu(self.unpool(self.dec_conv3(x)))
        x = F.relu(self.unpool(self.dec_conv2(x)))
        x = torch.sigmoid(self.unpool(self.dec_conv1(x)))
        return x

    def forward(self, x):
        z = self.encode(x)
        return self.decode(z)

Loss calculation with $L_1$ and SSIM (via pytorch-ssim package):

import pytorch_ssim

def hybrid_loss(x, x_hat, lambda1=0.15, lambda2=0.85):
    l1 = F.l1_loss(x_hat, x)
    ssim = 1 - pytorch_ssim.ssim(x_hat, x)
    return lambda1 * l1 + lambda2 * ssim

Theoretical and Practical Implications

Efficiency vs. Expressivity: By projecting trajectories into image space and then into a low-dimensional latent space, the method sidesteps the alignment and pairwise computation explosion in classical methods, vastly improving scalability.
Robustness: The pipeline is agnostic to variable trajectory lengths, sampling rates, and moderate noise, eliminating the need for extensive data cleaning or interpolation.
Extensibility: The segmentation into images enables further use of 2D CNN advances (e.g., attention, residual connections), and is compatible with transfer learning from other spatio-temporal domains.
Limitations: The pipeline implicitly assumes that spatial proximity and path shape are the dominant factors in trajectory similarity; contextual or semantic trajectory attributes (e.g., vessel speed, heading, or intent) are not directly encoded but could be incorporated in future versions.
Benchmarking Need: The lack of ground-truth-annotated benchmarks for maritime trajectory similarity precludes direct supervised evaluation; development of such resources would enable rigorous, standardized algorithmic comparison.

Outlook

This architecture could be generalized to other spatial-temporal trajectory types (e.g., vehicle, pedestrian, animal migration), or extended with multi-modal vessel characteristics (course, speed, metadata) via channel augmentation. The framework's robust, scalable, and unsupervised nature makes it highly suitable for integration as a trajectory embedding module in broader maritime intelligence, monitoring, or anomaly detection platforms.

Summary Table: Key Performance Metrics

Metric	CAE (Ours)	DTW	Fréchet
Pairwise Similarity Time ( $N=500$ )	~3.7 s	~673 s	~535 s
Feature Vector Dimensionality	3	N/A	N/A
Robust to Length/Sampling	Yes	Limited	Limited
Clustering Quality (AC score)	Lower (better)	Higher	Higher
Parameter Sensitivity	Moderate	High	High
Label-Free	Yes	Yes	Yes

Conclusion

This paper demonstrates the effectiveness of a CAE-based unsupervised learning approach for vessel trajectory similarity computation. The paradigm achieves high accuracy and dramatic computational acceleration over classical methods, particularly on realistic, noisy, and large-scale AIS trajectory data. Its design also sets a precedent for further research in trajectory representation learning and similarity search in spatio-temporal domains, with clear extensibility toward both richer trajectory semantization and cross-domain adaptation.