Geometric Foundation Models (GFMs)

Updated 22 February 2026

Geometric Foundation Models (GFMs) are pre-trained models that leverage intrinsic non-Euclidean structures to capture spatial, graph, and 3D relationships.
GFMs employ self- and weakly supervised learning with geometry-specific pretext tasks to enhance transferability and few-shot adaptation across domains.
They integrate structural balancing and advanced pretraining objectives, enabling superior performance on benchmarks in geospatial, graph, and 3D vision applications.

A Geometric Foundation Model (GFM) is a pre-trained model whose architecture and training procedures are expressly tailored to leverage the geometric or structural properties of its input domain. GFMs have been introduced across geospatial, graph, 3D vision, and point cloud learning, using self-supervised or weakly supervised objectives to arrive at task-agnostic representations that can be efficiently adapted to a range of downstream applications. Distinct from classical foundation models—which absorb domain-specific data into a Euclidean embedding space and often overlook intrinsic geometric structure—GFMs systematically encode information on non-Euclidean manifolds, leverage structural graph invariants, or directly operate on geometric tokens such as subtrees, cycles, or point-cloud neighborhoods. This design enables high transferability, few-shot adaptation, and superior performance in settings where geometric inductive biases are critical.

1. Mathematical and Architectural Foundations

A GFM is formally a neural operator

$f_{\theta}: \mathcal{X} \to \mathbb{R}^d$

where $\mathcal{X}$ is a geometric or structured input space—such as $\mathbb{R}^3$ point clouds, geolocation tuples $(\phi,\lambda)$ , or graph adjacency matrices—and $f_{\theta}$ is pretrained to minimize a self-supervised or weakly supervised loss, often exploiting domain- or geometry-informed pretext tasks. For instance, geospatial GFMs use pixel-time-series or coordinate-based contrastive objectives (Purohit et al., 21 Jan 2025); graph GFMs may regress graph invariants or perform manifold-aware message passing (Sun et al., 5 Feb 2025, Sun et al., 6 Aug 2025).

Non-Euclidean geometry is intrinsic in many GFM architectures: Riemannian product bundles (hyperbolic × spherical) for graph substructure embeddings (Sun et al., 5 Feb 2025), attention and residuals re-expressed on Riemannian manifolds with learned curvature (He et al., 11 Apr 2025), or parallel multi-algebra message passing (real/complex/split-complex/dual) for knowledge graphs (Xin et al., 28 Dec 2025). Architectures also include latent diffusion backbones (for geospatial imagery) (Jia et al., 10 Mar 2025), and Geometric Neural Operators for point clouds (Quackenbush et al., 6 Mar 2025). Billion-scale GFMs employ Transformer variants adapted for graph heterogeneity and structure-aware attention (Bechler-Speicher et al., 4 Feb 2026).

2. Pretraining Objectives and Data Distributions

The pretraining regime for GFMs is explicitly geometric or structural:

Spatial or Structural Balancing: Geospatial GFMs evidence that globally balanced spatial pretraining distributions (e.g., uniform random, stratified by biome/continent) produce more robust representations than clustered distributions (e.g., sampling only from cities or forests), especially in few-shot settings (Purohit et al., 21 Jan 2025).
Contrastive/View-based Pretext Tasks: Graph and geospatial models use contrastive learning over spatial-temporal contexts, two-geometry views (hyperbolic vs. spherical) (Sun et al., 5 Feb 2025), or universal link-prediction templates (Yuan et al., 5 Nov 2025). Point-cloud GFMs train to regress local geometric quantities (metric, curvature, normals) under noise/outlier regimes (Quackenbush et al., 6 Mar 2025).
Graph Invariant Regression and Structured Positional Encoding: Approaches such as GraphProp pretrain by regressing a suite of graph invariants (Fiedler value, clique number, Lovász number, etc.), enforcing structural generality and enabling highly transferable node or graph representations (Sun et al., 6 Aug 2025).
Graphon-based Generative Vocabularies: GRAVER learns generative graph vocabularies via graphon estimation to augment few-shot support sets and stabilize fine-tuning (Yuan et al., 5 Nov 2025).

A rigorous ablation in the spatial domain demonstrated that the choice of sampling distribution during pretraining may affect the downstream F₁-score by up to 10% under extreme label scarcity, highlighting the primacy of data diversity and geometric coverage (Purohit et al., 21 Jan 2025).

3. Model Classes and Task Families

Geometric Foundation Models have been instantiated across several domains and architectures:

Domain	Model Classes	Core Tasks Supported
Geospatial	Temporal Transformers, Diffusion U-Nets, ViTs	Land cover classification, segmentation, biome identification
Graph	Product-bundle Riemannian GNNs, Graph Transformers	Node/edge classification, link prediction, graph-level tasks
Knowledge Graph	Multi-algebra NBFNets, Parallel Message Passing	Zero-shot link prediction, reasoning on unseen entities/relations
Point Cloud	Geometric Neural Operators	Metric/curvature estimation, geometric PDE, shape flow
3D Vision	End-to-end 3D ViTs, Diffusion GFMs	Depth estimation, 3D reconstruction, pose estimation, synthesis

Each class precisely encodes relevant geometric structure: e.g., RiemannGFM decomposes all graphs into vocabulary of rooted trees and small cycles, embedding each on an optimally matched constant-curvature manifold (Sun et al., 5 Feb 2025); SatDiFuser leverages noise-conditioned U-Net features from pretrained generative diffusion models (Jia et al., 10 Mar 2025). In knowledge graphs, Gamma employs multi-head message passing, each in a different algebraic domain to jointly encode symmetry, anti-symmetry, hierarchical, and translation patterns (Xin et al., 28 Dec 2025).

4. Transferability, Adaptation, and Scaling Laws

GFMs are characterized by cross-domain, few-shot, and zero-shot transferability:

Attribute-/Token-Free Generalization: RiemannGFM demonstrates transfer without access to node attributes or language tokens by treating geometric substructures as universal graph “tokens” (Sun et al., 5 Feb 2025).
Structural vs. Attribute Decoupling: GraphProp achieves superior performance (up to +6–10 points in accuracy over baselines) on both attributed and structure-only graphs, with the structural phase forced to encode information invariant to domain (Sun et al., 6 Aug 2025).
Dynamic Geometry/Task Adaptivity: Position paper evidence shows that optimal GFM performance requires per-task geometry selection (matching curvature, e.g., hyperbolic for trees, spherical for cycles). Dynamic mixture-of-expert routing and product-manifold embeddings yield further reduction in representation distortion and task loss (He et al., 11 Apr 2025).
Scaling Laws in Graph GFMs: Billion-scale experiments with GraphBFF Transformers reveal power-law scaling of loss with both model size and data size, similar to LLM/Vision FMs, with exponents $\alpha_N\sim0.7$ (model-limited) and $\alpha_D\sim0.18$ (data-limited), and robust transfer to unseen graphs (Bechler-Speicher et al., 4 Feb 2026).

Empirical studies report near-isometric embedding of hierarchical/cyclic data by appropriate non-Euclidean GFMs, exponentially lower distortion for trees in $\mathbb{H}^2$ vs. $\mathbb{R}^d$ , and strict accuracy gains from combining geometric heads for knowledge graphs (Xin et al., 28 Dec 2025, He et al., 11 Apr 2025).

5. Benchmarks and Empirical Evaluations

Multiple standardized benchmarks now enable systematic comparison of GFMs:

Geospatial Benchmarks: PANGAEA provides a global evaluation suite—spanning resolution, modality, temporality, and region—for GFMs, reporting that no single GFM trait (size, architecture, dataset) guarantees universal performance. Supervised baselines (U-Net, ViT) can match or exceed GFMs with abundant labels, but GFMs show notable advantage in label-scarce regimes (Marsocci et al., 2024). Balanced spatial pretraining is confirmed crucial (Purohit et al., 21 Jan 2025).
3D Vision: E3D-Bench evaluates 16 GFMs on sparse/dense depth estimation, 3D reconstruction, pose estimation, and view synthesis, showing that end-to-end GFMs can generalize across data sources, but fail on extreme distribution gaps or metric-scale recovery. Performance depends substantially on backbone architecture and 2D feature extraction strategy (Cong et al., 2 Jun 2025).
Graph Transfer: Comparative experiments document superior performance of GRAVER’s generative graph vocabulary augmentation on one-shot node/graph classification, with state-of-the-art accuracy and improved stability in fine-tuning (Yuan et al., 5 Nov 2025). RiemannGFM outperforms LLM-fused and self-supervised GNNs on transfer to non-attributed graphs (Sun et al., 5 Feb 2025).
Knowledge Graphs: The Gamma model demonstrates strictly higher expressivity and accuracy (up to +7% MRR) over parametrically matched single-algebra baselines, with gains isolated to the geometric mechanism rather than parameter count (Xin et al., 28 Dec 2025).

6. Challenges, Limitations, and Directions

GFMs face open challenges and limitations:

Geometry/Pretraining Universe Selection: Key open questions concern the optimal selection or discovery of geometric/structural “tokens” (e.g., moving beyond trees and short cycles), the definition of the pretraining universe to maximize cross-domain utility, and the tradeoff between rare-type inclusion and efficiency (Bechler-Speicher et al., 4 Feb 2026, Sun et al., 5 Feb 2025).
Computational Bottlenecks: Message passing and manifold operations scale worse than their Euclidean counterparts; efficient libraries and hardware acceleration for Riemannian operations remain targets (He et al., 11 Apr 2025).
Structural Bias and Representational Robustness: Both spatial and graph GFMs can exhibit severe drops when pretraining and test domains are mismatched in scale, region, or geometry. Robustness to adversarial or distributional shifts is not currently guaranteed (Marsocci et al., 2024).
Explainability and Interpretability: The composition of non-Euclidean and algebraic heads, and the geometry-adaptive mechanisms, pose unique explainability challenges—e.g., interpreting MoE attention weights in Gamma or the role of graphon-based vocabularies in GRAVER.
Future Research: Proposed advancements include (i) curvature-adaptive architectures (curvature per layer or head), (ii) mixed-modality and cross-modal fusion (optical/SAR, vision-language GFMs), and (iii) data- and geometry-centric benchmarks for nuanced evaluation (He et al., 11 Apr 2025, Marsocci et al., 2024). Extensions to continual and parameter-efficient adaptation protocols are also under development (Bechler-Speicher et al., 4 Feb 2026).

Geometric Foundation Models represent an architectural and theoretical advance in the foundation model paradigm, establishing principled pathways for harnessing non-Euclidean and structural inductive biases at scale. Their development and deployment are guided by domain-specific pretraining objectives, empirical evidence for data-diversity, and cross-domain/geometry-aware evaluation—setting the technical foundations for broad, geometry-aware machine learning.