- The paper shows that satellite data’s distinct spatial, temporal, and spectral characteristics demand specialized machine learning techniques.
- The paper highlights deployment challenges such as processing vast data volumes, ensuring dense predictions, and handling limited ground truth for evaluations.
- The paper calls for advancing research in distribution shift, self-supervised and multi-modal learning while addressing ethical and governance concerns.
Introduction
Satellite Data and Machine Learning (SatML) is emerging as a pivotal domain with unique characteristics that significantly differ from standard Machine Learning (ML) data modalities. This position paper argues for the distinctive nature of satellite data within ML and emphasizes the need for bespoke research agendas tailored to its idiosyncrasies.
Distinct Characteristics of Satellite Data
Satellite data, with its unique spatial and temporal scales, robust spectral channels, and immense data volumes, is not well-served by traditional ML methods. Objects in satellite images range from meters to kilometers, with temporal patterns varying from hours to decades, underpinning the necessity for specialized techniques. Various satellite sensors across the electromagnetic spectrum offer rich, multi-spectral information beyond standard RGB channels, but common ML libraries lack support for the many-channel datasets, indicating a clear disconnect.
Deployment and Evaluation Challenges
When deploying ML for satellite data, dense predictions are required, as SatML models are expected to map entire geographic regions. This demands efficient processing of the vast data volumes that these models encounter. However, model evaluations currently fall short due to the scarcity of high-quality, uniformly sampled ground truth data. The problem is exacerbated by the confounding effects of spatial autocorrelation within the data. Moreover, SatML carries its ethical challenges, raising questions about privacy and the potential misuse of the technology in surveillance and monitoring without community consent.
Satellite Data as a Catalyst for ML Research
The singular traits of satellite data have the potential to invigorate research in distribution shift, self-supervised learning, multi-modal learning, and positional encoding. For instance, distribution shifts in SatML are not just academic concerns but present real-world challenges due to the spatial, temporal, spectral, and scale variability inherent in satellite data. Self-supervised learning is particularly pivotal given the abundance of unlabeled satellite data, which necessitates research into efficient and scalable labeling mechanisms. Multi-modal learning in SatML can leverage the rich assortment of sensor data, offering fresh challenges to the field, while positional encoding research can benefit from the small-scale to large-scale hierarchical nature of satellite data, which is fundamentally non-Euclidean.
Advancing the SatML Research Agenda
It is evident that substantial shifts in ML community priorities are required to realize SatML's potential. This includes recognizing the prime SatML challenges, fostering a collaborative and inclusive research community, and aligning research progress with real-world impact. Moreover, as SatML has implications for societal and environmental domains, careful consideration must be given to the ethical aspects and data governance relating to community benefit and data stewardship. Thus, the paper poses a critical call for researchers to innovate responsibly in SatML, pushing for a dedicated discipline that acknowledges the distinct challenges and maximizes the impactful use of satellite data.