Papers
Topics
Authors
Recent
Search
2000 character limit reached

Multi-Modal Vision Transformers for Crop Mapping from Satellite Image Time Series

Published 24 Jun 2024 in cs.CV | (2406.16513v1)

Abstract: Using images acquired by different satellite sensors has shown to improve classification performance in the framework of crop mapping from satellite image time series (SITS). Existing state-of-the-art architectures use self-attention mechanisms to process the temporal dimension and convolutions for the spatial dimension of SITS. Motivated by the success of purely attention-based architectures in crop mapping from single-modal SITS, we introduce several multi-modal multi-temporal transformer-based architectures. Specifically, we investigate the effectiveness of Early Fusion, Cross Attention Fusion and Synchronized Class Token Fusion within the Temporo-Spatial Vision Transformer (TSViT). Experimental results demonstrate significant improvements over state-of-the-art architectures with both convolutional and self-attention components.

Citations (2)

Summary

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.