Google AlphaEarth Embeddings
- Google AlphaEarth (AE) embeddings are 64-dimensional, globally consistent geospatial feature vectors derived from multi-modal, multi-temporal Earth observation data at 10 m resolution.
- They are generated using deep self-supervised, contrastive, and reconstruction objectives across diverse sensor inputs, ensuring integration of optical, SAR, LiDAR, climate, and text data.
- AE embeddings enable robust downstream applications including land cover mapping, hydrological simulation, socioeconomic modeling, and disaster susceptibility analysis, while addressing challenges in transferability and interpretability.
Google AlphaEarth (AE) Embeddings are 64-dimensional, globally consistent, geospatial feature vectors derived from multi-modal, multi-temporal Earth observation data. Trained via deep self-supervised, contrastive, and reconstruction objectives, these embeddings are designed to serve as compact, information-rich representations of Earth’s surface, readily supporting a wide range of downstream remote sensing, environmental, and socioeconomic modeling tasks. Since 2025, AE embeddings have become a standard analysis-ready product, distributed as annual global layers at 10 m spatial resolution via Google Earth Engine.
1. Model Formulation, Architecture, and Training Paradigm
The AlphaEarth Foundations model, as defined by Brown et al. (Brown et al., 29 Jul 2025), learns an embedding function
mapping spatiotemporal coordinates, multi-sensor measurement stacks, and metadata into a dense latent vector. The model assimilates optical (Sentinel-2, Landsat 8/9), SAR (Sentinel-1), LiDAR (GEDI), DEM, climate (ERA5-Land), gravimetry (GRACE), and geolocated Wikipedia/GBIF text sources.
Encoder Composition and Latent Bottleneck
AE employs a multi-path Space–Time–Precision (STP) encoder:
- Space path: Vision Transformer-style global attention at coarse spatial scale.
- Time path: Axial attention along the temporal axis.
- Precision path: Hierarchical convolutions at finer scales. Intermediate representations are periodically fused via Laplacian-pyramid resampling.
The final bottleneck pools all spatial tokens into a single 64-dimensional (unit vector) embedding, , via a von Mises–Fisher distribution with concentration . The complete model, including teacher and student pathways for consistency training, comprises ≈480M parameters.
Joint Loss Function
Training is end-to-end, minimizing a composite loss: with normalization: , , , . Reconstruction losses cover multiple measurement domains; CLIP-alignment loss aligns embeddings with Wikipedia/GBIF text features (Brown et al., 29 Jul 2025, Houriez et al., 15 Aug 2025, Liu et al., 10 Oct 2025, Ma et al., 30 Dec 2025).
Masked autoencoding and teacher–student consistency regularize the model against input dropout, temporal truncation, and sensor modality loss. Training is performed on 5.1M spatially stratified sites over two annual periods (2017–2024), with loss weighting stratified by source and season.
2. Embedding Definition, Release, and Mathematical Properties
Each embedding is a 64-dim real vector per 10 m ground pixel, annualized: Spatial and temporal positioning are encoded through explicit sinusoidal functions over latitude, longitude, and date; measurement contexts are projected to a common space and concatenated (Brown et al., 29 Jul 2025). The output is quantized to signed-8-bit or represented as double-precision for some assets (Zvonkov et al., 4 Nov 2025).
Annual embedding fields (2017–2024) are published as global Google Earth Engine (GEE) assets:
GOOGLE_SATELLITE_EMBEDDING_V1_ANNUAL (Brown et al., 29 Jul 2025, Houriez et al., 15 Aug 2025, Zvonkov et al., 4 Nov 2025).
The embeddings can be profiled mathematically by:
- Averaging over region or time for static region-level descriptors,
- Computing per-dimension mean/std for standardization,
- Employing cosine similarity for geospatial similarity, domain adaptation, or clustering (Qu et al., 4 Jan 2026).
No explicit normalization is applied in the standard GEE assets. Dimensionality reduction (e.g., PCA truncation at explained variance) is used downstream in some settings but yields marginal benefit over the full 64-dim vectors (Cheng et al., 12 Jan 2026).
3. Integration in Downstream Geospatial and Environmental Workflows
AE embeddings serve as plug-and-play feature vectors for a variety of ML models:
- Land Cover & Vegetation Mapping: Off-the-shelf random forests, LightGBM, or logistic regression can be trained on single-year or multi-year AE stacks to predict class labels at pixel or region scale (Houriez et al., 15 Aug 2025, Zvonkov et al., 4 Nov 2025).
- Hydrological Simulation: Region-aggregated, standardized AE vectors concatenate with time-resolved weather/catchment forcings, replacing hand-crafted attributes (e.g., CAMELS) as static descriptors in LSTM or MLP-RNN predictors (Qu et al., 4 Jan 2026).
- Socioeconomic Modeling: AE features serve as node embeddings in GNNs for poverty mapping, paired with survey clusters or settlements, with graph edges informed by spatial or probabilistic match heuristics (Pettersson et al., 3 Nov 2025).
- Agricultural Yield & Practice Prediction: Average AE embeddings at field/county level drive regression/classification for crop yield, tillage, and cover cropping, benchmarked against process-driven remote sensing featurizations (Ma et al., 30 Dec 2025).
- Disaster Susceptibility: Binary classifiers (CNN1D, CNN2D, ViT) for landslide susceptibility operate directly on full or PCA-truncated AE vectors, consistently outperforming traditional landslide conditioning factors (Cheng et al., 12 Jan 2026).
Typical ML pipeline steps:
- Download or sample per-pixel annual AE embeddings from GEE.
- Optionally aggregate spatially or temporally as needed (mean-pooling, PCA).
- Assemble training data for supervised learning.
- Fit classifiers/regressors (RF, XGB, MLP, LSTM, GCN); threshold or interpret as appropriate.
Illustrative GEE and Python code is published for end-to-end prototyping (Brown et al., 29 Jul 2025, Zvonkov et al., 4 Nov 2025).
4. Comparative Performance and Empirical Evaluations
AE embeddings consistently outperform, or are competitive with, both classical hand-crafted and contemporary ML-based remote sensing features:
| Task Domain | AlphaEarth (AE) Performance | Notable Next-Best | AE vs Alternative | Reference |
|---|---|---|---|---|
| Land cover, LCMAP BA (%) | 89.7 | 84.3 (MOSAIKS) | –5.4% error | (Brown et al., 29 Jul 2025) |
| Vegetation type (validation, USA/Canada) | 0.73 ACC (RF, 13 classes, Canada) | - | Comparable to in-domain USA | (Houriez et al., 15 Aug 2025) |
| Crop mapping (Togo, F1) | 0.745 (vs Presto 0.808) | Presto (0.808) | Slightly underperforms Presto | (Zvonkov et al., 4 Nov 2025) |
| Hydrology (NSE OOS, LSTM) | 0.612 | 0.553 (CAMELS attr) | +0.059 gain in spatial transfer | (Qu et al., 4 Jan 2026) |
| Poverty mapping (Sub-Saharan R²) | 0.55 (image-only) | 0.57 (image+ego GNN) | Marginal graph improvement | (Pettersson et al., 3 Nov 2025) |
| Landslide susceptibility (F1, CNN2D) | 0.97 (Emilia) | 0.81 (LCFs) | +15–16% F1, +0.04–0.11 AUC | (Cheng et al., 12 Jan 2026) |
| Crop yield (county, ) | 0.79–0.80 (XGB, Corn) | 0.74–0.76 (RS baseline) | Better on local, not on transfer | (Ma et al., 30 Dec 2025) |
Overall, in “max-trial” balanced accuracy or tasks across regional and global benchmarks, AE achieves 23.9% mean error reduction over alternatives, especially where label scarcity is a constraint (Brown et al., 29 Jul 2025).
5. Limitations, Transferability, and Interpretability
AE embeddings show several practical limitations:
- Spatial transferability: Performance degrades when transferring across highly dissimilar regions, attributed to embedding bias from static geophysical inputs and region-specific training data (Ma et al., 30 Dec 2025).
- Temporal sensitivity: Annual embedding cadence restricts in-season or sub-annual inference; performance is modulated by EO data availability in the aggregation period (Ma et al., 30 Dec 2025, Zvonkov et al., 4 Nov 2025).
- Interpretability: Dimensions (“A00–A63”) lack explicit physical meaning; feature importance can be assigned, but semantics are opaque (Ma et al., 30 Dec 2025).
- Black-box nature: Internal architectural and training specifics are not always transparent in public products; hyperparameters, tiling, loss weighting, and embedding normalization are fixed (Zvonkov et al., 4 Nov 2025, Qu et al., 4 Jan 2026).
- Storage: Asset size per pixel (e.g., 512 B in double; 64 B in int8) is higher than some alternatives (e.g., Presto, 256 B) (Zvonkov et al., 4 Nov 2025).
- Domain coverage: AE may excel on physically grounded or “morphology-driven” tasks but is limited for human activity or socioeconomic mapping without multimodal adaptation (Liu et al., 10 Oct 2025).
6. Extensions: Human-Centered Enrichment and Multimodal Alignment
Baseline AE representations encode physical, spectral, and environmental context but capture little about functional use or human-centered semantics. A noteworthy extension, AETHER, introduces POI-guided contrastive alignment: POI (Point of Interest) text embeddings are matched with local AE feature pools via a lightweight two-layer MLP projector and cross-modal contrastive loss. This yields enriched 128-dimensional embeddings (Liu et al., 10 Oct 2025).
AETHER demonstrates:
- +7.2% relative F1 improvement in urban land-use classification,
- 23.6% KL divergence reduction in occupational distribution mapping, over AE-only features in Greater London. This suggests that joint EO–semantic embedding strategies can plug the functionality/meaning gap in urban and socioeconomic analytics.
The modular adapter design of AETHER allows for broader integration with LLMs, multimodal graph encoders, and future Earth Observation backbone improvements.
7. Applications and Practical Guidelines
AE embeddings have been productively deployed in the following broad applications:
- Earth system monitoring (land-use/cover, biomass, hydrology, ET),
- Food/agricultural system quantification (crop mapping, yield estimation, tillage/cover detection),
- Natural disaster modeling (landslide, flood susceptibility) (Cheng et al., 12 Jan 2026, Qu et al., 4 Jan 2026),
- Socioeconomic inference (poverty mapping, urban zoning) (Pettersson et al., 3 Nov 2025, Liu et al., 10 Oct 2025),
- Rapid prototyping and “low-shot” supervised learning pipelines (Brown et al., 29 Jul 2025).
Practical usage involves mean-pooling or PCA for region-level encoding, concatenation with dynamic variables, and plug-and-play with downstream DNN, RF, or GCN architectures. Cosine similarity in embedding space can be used for analog region search, transfer learning, or regime clustering (Qu et al., 4 Jan 2026). For highest predictive accuracy, embedding-based donor selection or clustering (e.g., -nearest in AE space) is favored over purely geographic heuristics.
Limitations in transferability, interpretability, and temporal granularity motivate continued development toward regionally balanced, temporally resolved, and human-understandable embedding variants. Potential enhancements include denser temporal embedding products and explicit concept attribution.
Key references: (Brown et al., 29 Jul 2025, Houriez et al., 15 Aug 2025, Zvonkov et al., 4 Nov 2025, Brown et al., 29 Jul 2025, Ma et al., 30 Dec 2025, Pettersson et al., 3 Nov 2025, Cheng et al., 12 Jan 2026, Qu et al., 4 Jan 2026, Liu et al., 10 Oct 2025).