SatCLIP: Global, General-Purpose Location Embeddings with Satellite Imagery
Abstract: Geographic information is essential for modeling tasks in fields ranging from ecology to epidemiology. However, extracting relevant location characteristics for a given task can be challenging, often requiring expensive data fusion or distillation from massive global imagery datasets. To address this challenge, we introduce Satellite Contrastive Location-Image Pretraining (SatCLIP). This global, general-purpose geographic location encoder learns an implicit representation of locations by matching CNN and ViT inferred visual patterns of openly available satellite imagery with their geographic coordinates. The resulting SatCLIP location encoder efficiently summarizes the characteristics of any given location for convenient use in downstream tasks. In our experiments, we use SatCLIP embeddings to improve prediction performance on nine diverse location-dependent tasks including temperature prediction, animal recognition, and population density estimation. Across tasks, SatCLIP consistently outperforms alternative location encoders and improves geographic generalization by encoding visual similarities of spatially distant environments. These results demonstrate the potential of vision-location models to learn meaningful representations of our planet from the vast, varied, and largely untapped modalities of geospatial data.
- The auto arborist dataset: A large-scale benchmark for multiview urban forest monitoring under domain shift. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pages 21294ā21307, 2022.
- Multi-Task Observation Using Satellite Imagery and Kitchen Sinks (MOSAIKS) API. https://siml.berkeley.edu, 2022.
- Geoclip: Clip-inspired alignment between locations and images for effective worldwide geo-localization. arXiv preprint arXiv:2309.16020, 2023.
- Gated residual recurrent graph neural networks for traffic prediction. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 485ā492. AAAI Press, 2019.
- Functional map of the world. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pages 6172ā6180, 2018.
- Spatial implicit neural representations for global-scale species mapping. In Proceedings of the 40th International Conference on Machine Learning, pages 6320ā6342. PMLR, 2023.
- An ecoregion-based approach to protecting half the terrestrial realm. BioScience, 67(6):534ā545, 2017.
- Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9729ā9738, 2020.
- Data descriptor: A global dataset of air temperature derived from satellite remote sensing and weather stations. Scientific Data, 5:1ā11, 2018.
- The iNaturalist species classification and detection dataset. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pages 8769ā8778, 2018.
- Mapping missing population in rural India: A deep learning approach with satellite imagery. In Proceedings of the AAAI Conference on Artificial Intelligence, 2019.
- Combining satellite imagery and machine learning to predict poverty. Science, 353:790ā794, 2016.
- Tile2vec: Unsupervised representation learning for spatially distributed data. In Proceedings of the AAAI Conference on Artificial Intelligence, 2019.
- Residual correlation in graph neural network regression. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 588ā598. Association for Computing Machinery, 2020.
- Adam: A method for stochastic optimization. In Proceedings in the International Conference on Learning Representations (ICLR), 2015.
- Auxiliary-task learning for geographic data with autoregressive embeddings. In SIGSPATIAL: Proceedings of the ACM International Symposium on Advances in Geographic Information Systems, 2021.
- Population mapping in informal settlements with high-resolution satellite imagery and equitable ground-truth. 2020.
- Spate-gan: Improved generative modeling of dynamic spatio-temporal patterns with an autoregressive embedding loss. Proceedings of the AAAI Conference on Artificial Intelligence, 36:4523ā4531, 2022.
- Denethor: The dynamicearthnet dataset for harmonized, inter-operable, analysis-ready, daily crop monitoring from space. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2), 2021.
- Geo-bench: Toward foundation models for earth monitoring. arXiv preprint arXiv:2306.03831, 2023.
- The benchmarking initiative for multimedia evaluation: Mediaeval 2016. IEEE Multimedia, 24:93ā96, 2017.
- A scalable satellite-based crop yield mapper. Remote Sensing of Environment, 164:324ā333, 2015.
- Shoreline feature extraction from remotely-sensed imagery. International Geoscience and Remote Sensing Symposium (IGARSS), 6:3417ā3419, 2002.
- Presence-only geographical priors for fine-grained image classification. In ICCV, 2019.
- Multi-scale representation learning for spatial feature distributions using grid cells. In Proceedings in the International Conference on Learning Representations (ICLR), 2020.
- CSP: Self-supervised contrastive spatial pre-training for geospatial-visual representations. arXiv preprint arXiv:2305.01118, 2023.
- Seasonal contrast: Unsupervised pre-training from uncurated remote sensing data. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pages 9414ā9423, 2021.
- Sparse spatial autoregressions. Statistics & Probability Letters, 33:291ā297, 2003.
- LindiĀ J. Quackenbush. A review of techniques for extracting linear features from imagery. Photogrammetric Engineering and Remote Sensing, 70:1383ā1392, 2004.
- Learning transferable visual models from natural language supervision. In Proceedings in the International Conference on Machine Learning (ICML), pages 8748ā8763. PMLR, 2021.
- Reforestree: A dataset for estimating tropical forest carbon stock with deep learning and aerial imagery. Proceedings of the AAAI Conference on Artificial Intelligence, 36:12119ā12125, 2022.
- A generalizable and accessible approach to machine learning with global satellite imagery. Nature Communications 2021 12:1, 12:1ā11, 2021.
- Meta-learning for few-shot land cover classification. In Proceedings of the ieee/cvf conference on computer vision and pattern recognition workshops, pages 200ā201, 2020.
- Geographic location encoding with spherical harmonics and sinusoidal representation networks. arXiv preprint arXiv:2310.06743, 2023.
- Self-supervised vision transformers for land-cover segmentation and classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1422ā1431, 2022.
- Implicit neural representations with periodic activation functions. Advances in Neural Information Processing Systems, 33:7462ā7473, 2020.
- YFCC100M. Communications of the ACM, 59:64ā73, 2016.
- Mask R-CNN-based building extraction from VHR satellite data in operational humanitarian action: An example related to Covid-19 response in Khartoum, Sudan. Transactions in GIS, 25:1213ā1227, 2021.
- TIML: Task-informed meta-learning for agriculture. arXiv preprint arXiv:2202.02124, 2022.
- Laurens VanĀ der Maaten and Geoffrey Hinton. Visualizing data using t-SNE. Journal of machine learning research, 9(11), 2008.
- SSL4EO-S12: A large-scale multi-modal, multi-temporal dataset for self-supervised learning in Earth observation. 2022a.
- Self-supervised learning in remote sensing: A review. IEEE Geoscience and Remote Sensing Magazine, 10:213ā247, 2022b.
- Moving in time and space ā location intelligence for carsharing decision support. Decision Support Systems, 2017.
- Semantic segmentation of slums in satellite images using transfer learning on fully convolutional neural networks. ISPRS Journal of Photogrammetry and Remote Sensing, 150:59ā69, 2019.
- GPS2Vec: Towards generating worldwide GPS embeddings. In SIGSPATIAL: Proceedings of the ACM International Symposium on Advances in Geographic Information Systems, pages 416ā419. Association for Computing Machinery, 2019.
- Sigmoid loss for language image pre-training. In ICCV, 2023.
- Quality assessment for geoāspatial objects derived from remotely sensed data. International Journal of Remote Sensing, 26:2953ā2974, 2007.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.