Papers
Topics
Authors
Recent
Search
2000 character limit reached

Atlas-Free Voxel-Level Models

Updated 22 February 2026
  • Atlas-free voxel-level models represent volumetric data without pre-defined anatomical templates, ensuring unbiased and native spatial fidelity.
  • They employ advanced architectures like 3D U-Nets, Vision Transformers, and FPNs with self-supervised and contrastive learning techniques.
  • These models enhance segmentation, classification, and generalization in 3D medical imaging, neuroimaging, and materials science applications.

Atlas-free voxel-level foundation models constitute a paradigm within machine learning that enables direct modeling and representation of volumetric data at native spatial resolutions, without recourse to external anatomical templates or spatial atlases. These models are predominantly applied in 3D medical imaging, neuroimaging, and materials science, and are characterized by their ability to generalize, segment, or encode arbitrary input volumes on a per-voxel basis, relying entirely on data-driven priors and learned hierarchical features. The absence of explicit atlas-based alignment distinguishes them from traditional region-of-interest (ROI) or atlas-parcellated approaches, ensuring unbiased spatial representation and maximal spatial fidelity.

1. Foundational Concepts and Motivations

Atlas-free, voxel-level foundation models address the need for unbiased, generalizable volumetric representation learning in domains where spatial correspondence to a template is infeasible, potentially misleading, or computationally prohibitive. In 3D medical imaging and neuroimaging, imposing an atlas or spatial template may introduce interpolation artifacts, anatomical bias, or fail to account for population-level variability—especially in pathologic or cross-modality settings. These models forgo any spatial normalization, external coordinate warping, or handcrafted region definitions. Instead, they employ strategies that capture multi-scale, local-to-global information across the full native voxel grid, learning representations that transfer across downstream tasks such as classification, regression, segmentation, or structural analysis (An et al., 11 Jul 2025, Wang et al., 26 Dec 2025, Wang et al., 30 Jan 2026, He et al., 2024).

2. Model Architectures and Embedding Strategies

Several families of atlas-free voxel-level models have emerged, differing in their neural architectures, pretraining objectives, and input paradigms:

The distinguishing feature remains the commitment to learning on the native voxel grid (no affine alignment or parcellation), with only local spatial pooling, patchification, or random spatial masking as architectural biases.

3. Training Regimes and Data Pipelines

Atlas-free models are typically trained under large-scale self-supervised or multi-source supervised paradigms:

  • Self-Supervised Learning (SSL): Masked reconstruction (MAE), hierarchical contrastive frameworks (vox2vec, Adam), and teacher–student momentum (DINOv2 in TAP-CT) are widely employed. In SSL, the foundation model may be pretrained on hundreds of thousands of volumes (e.g., TAP-CT: 105K CTs; polycrystal informatics: 100K microstructures) (Veenboer et al., 30 Nov 2025, Wei et al., 7 Dec 2025).
  • Synthetic Data and Domain Randomization: Models such as SynthFM-3D and vesselFM use mathematically-parameterized generators to synthesize anatomically and contrast-diverse volumes, supporting analytical control over label evolution, appearance, and noise, and enabling zero/few-shot generalization to modalities absent from the real training set (Chakrabarty et al., 18 Jan 2026, Wittmann et al., 2024).
  • Multi-branch and Multi-modal Training: VISTA3D integrates “prompt-indexed" automatic heads and supervoxel-distilled interactive heads, combining robust class-based annotation pipelines with zero-shot region segmentation via distilled supervoxels from 2D backbones (He et al., 2024).
  • Dynamic or Adaptive Subsampling: To mitigate the high memory/compute demand, mechanisms such as dynamic patch partitioning (Omni-fMRI), top-k temporal window selection (SLIM-Brain), and selective patch merging are employed to focus learning capacity on salient or information-rich subvolumes (Wang et al., 30 Jan 2026, Wang et al., 26 Dec 2025).

Typical implementations eschew anatomical registration, spatial normalization, or handcrafted label spaces, instead building all spatial and category priors from data or synthetic generation.

4. Evaluation Metrics, Benchmarks, and Empirical Performance

Performance of atlas-free voxel-level foundation models is assessed on diverse 3D benchmarks, often spanning multiple domains and tasks. Key metrics include:

Parameter efficiency, memory/computation savings (e.g., dynamic patching in Omni-fMRI reduces attention FLOPs by ~10× (Wang et al., 30 Jan 2026)), and sample efficiency (e.g., SLIM-Brain achieves SOTA with only ~4K fMRI sessions (Wang et al., 26 Dec 2025)) are also reported.

5. Theoretical Guarantees and Methodological Properties

Atlas-free foundation models often leverage theoretical properties to justify or explain their efficacy:

  • Distance preservation and random projections: Raptor’s use of the Johnson–Lindenstrauss lemma ensures slice-embedding distances are preserved after random planar tensor reduction, underpinning a formal guarantee on semantic geometry retention (An et al., 11 Jul 2025).
  • Hierarchical self-supervision: Anatomically-driven SSL models (Adam) enforce locality and compositionality in the learned feature space, leading to dense, part-whole-aware embedding manifolds (Taher et al., 2023).
  • Dynamic scale and masking: Theoretical trade-offs between compressiveness and locality are established via ablation and neural scaling studies, e.g., effect of patch-complexity thresholds in Omni-fMRI (Wang et al., 30 Jan 2026).
  • Generalization via synthetic diversity: vesselFM’s and SynthFM-3D’s zero-shot performance is attributed to extensive synthetic domain randomization and flow matching, which enriches the sampling of plausible 3D scenes and intensity distributions, enabling transfer without domain-adaptive modules (Wittmann et al., 2024, Chakrabarty et al., 18 Jan 2026).

6. Practical Limitations, Extensions, and Future Directions

Despite their strengths, atlas-free voxel-level foundation models exhibit several practical and methodological limitations:

  • Resource constraints: Scaling to extremely large 3D volumes or high spatial resolutions remains memory- and compute-intensive, necessitating innovations such as hierarchical masking or dynamic patching (Wang et al., 30 Jan 2026, Wang et al., 26 Dec 2025).
  • Partial geometry handling: Some models (Raptor) assume approximate orthogonality or isotropy in slice features; highly anisotropic or irregular structures may degrade performance (An et al., 11 Jul 2025).
  • Task restriction: Binary segmentation (vesselFM), organ-specific adaptation, or lack of explicit modeling of very small-scale features (tiny vessels, fiber tracts) are noted limitations (Wittmann et al., 2024).
  • Lack of explicit inter-slice continuity models: Methods such as Raptor and slice-based pipelines do not explicitly encode local 3D neighborhood continuity beyond multi-view aggregation (An et al., 11 Jul 2025).

Potential extensions include:

  • Adaptive random projections and structured masking: To improve embedding efficiency or downstream task alignment (e.g., sparse Johnson–Lindenstrauss transforms in Raptor) (An et al., 11 Jul 2025).
  • Multi-modal, multi-resolution, and cross-domain fusion: Integrating clinical, anatomical, and imaging data to realize truly universal volumetric foundation models (Veenboer et al., 30 Nov 2025, He et al., 2024).
  • Generalization to new physics and sciences: Atlas-free design in crystallography/material informatics (Wei et al., 7 Dec 2025), or synthetic parameterization for transfer to unseen imaging or scientific domains (Chakrabarty et al., 18 Jan 2026).

7. Summary Table: Representative Atlas-Free Voxel-Level Foundation Models

Model Architecture Training Paradigm Application Domain Notable Empirical Metrics
Raptor 2D encoder + Train-free, random 3D medical imaging (MRI/CT) +3–14% AUROC vs. baselines (An et al., 11 Jul 2025)
random projection projection 0.8–0.9 r2 regression
TAP-CT Volumetric ViT DINOv2-style SSL CT, multi-task DSC=0.582 vs. 0.489 (2D DINOv2)
Omni-fMRI Dyn. patch ViT MAE with dynamic patching fMRI Outperforms NeuroSTORM, BrainLM
SLIM-Brain 4D Hiera-JEPA Window selection + JEPA fMRI, 7 benchmarks 91.1% ACC sex, 98.5% fingers, <2.4GB
vox2vec 3D FPN Multi-scale contrastive CT (organs/tumors) Linear probe Dice 69.2–75.5%
vesselFM 3D U-Net Real+synthetic+flow 3D vessel segmentation Octa: 46.9 DSC zero-shot; cross modality
VISTA3D 3D U-Net w/ Supervoxel distillation Multi-organ segmentation Dice 0.711–0.85 zero-shot, 127 classes
interactive head & interactive workflows

Each model listed demonstrates direct voxel-level inference in a fully atlas-free setting across diverse imaging modalities, yielding state-of-the-art performance in established evaluation regimes.


References: (An et al., 11 Jul 2025, Veenboer et al., 30 Nov 2025, Wang et al., 30 Jan 2026, Wang et al., 26 Dec 2025, Wittmann et al., 2024, He et al., 2024, Taher et al., 2023, Wei et al., 7 Dec 2025, Goncharov et al., 2023, Chakrabarty et al., 18 Jan 2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Atlas-Free Voxel-Level Foundation Models.