Papers
Topics
Authors
Recent
Search
2000 character limit reached

Foundation-Model Embeddings

Updated 2 February 2026
  • Foundation-model embeddings are vector representations from large pretrained neural networks that capture rich semantic, structural, and hierarchical information.
  • They are employed for cold-start active learning, information retrieval, and recommendation by enabling efficient clustering and medoid-based sample selection.
  • Empirical studies reveal significant improvements in medical image segmentation and classification, highlighting reduced variance and enhanced performance in low-label settings.

Foundation-model embeddings are vector representations produced by large, pretrained neural networks—commonly referred to as "foundation models"—that have been trained via self-supervised or contrastive paradigms on massive data corpora. These embeddings have proven highly effective as generic feature spaces for downstream tasks, especially in data regimes where labeled samples are scarce or tasks are heterogeneous. Within the context of cold-start strategies in active learning, information retrieval, and recommendation systems, foundation-model embeddings serve as low-dimensional, information-rich alternatives to raw features, enabling more efficient sampling, clustering, and model initialization.

1. Definition and Properties of Foundation-Model Embeddings

Foundation-model embeddings are produced by deep neural models that have been pretrained, typically on de-identified internet-scale data for general domains (e.g., ImageNet, RadImageNet for medical imaging), or on large, domain-specific corpora (e.g., multi-modal medical images). The embeddings are extracted, in most frameworks, from the penultimate layer of these models, and are then Lâ‚‚-normalized prior to being used for clustering or similarity calculations (Yuan et al., 2024, Levy et al., 26 Jan 2026). Key differentiating factors include:

  • Dimensionality: Typically in the range of hundreds to thousands (e.g., 1024 for DenseNet-121, 2048 for ResNet-50).
  • Information content: Encapsulate semantic, structural, and often hierarchical information learned from the pretraining dataset.
  • Task-agnostic utility: Demonstrated to transfer well across tasks, particularly in label-scarce or cold-start scenarios.

Embeddings derived from domain-specialized foundation models (e.g., TorchXRayVision or CXR Foundation for chest X-rays) consistently outperform generalist backbones (e.g., ImageNet-pretrained models) in domain-specific tasks (Yuan et al., 2024, Levy et al., 26 Jan 2026).

2. Methodological Role in Cold-Start and Active Learning

In the cold-start phase of active learning—where no or few labels are available and model-driven uncertainty is meaningless—foundation-model embeddings are used to systematize the selection of diverse and representative samples for annotation. The canonical workflow involves:

  1. Feature Extraction: For each candidate xx in the unlabeled pool UU, compute f(x)f(x), the normalized foundation-model embedding.
  2. Clustering: Perform kk-means clustering on {f(x):x∈U}\{f(x) : x \in U\} in the embedding space, incrementally increasing kk to match the labeling budget BB (Yuan et al., 2024).
  3. Representative Selection: In each cluster, select the medoid (point closest to the cluster centroid), thus ensuring the selected batch covers the major modes of the unlabelled distribution.
  4. Subset Consistency: By constructing the cold-start set using a nested clustering procedure (increasing kk), smaller budget samples are guaranteed to form strict subsets of larger-budget samples (Yuan et al., 2024).
  5. Transition to Model-based Selection: Once a seed model is trained, downstream acquisition switches to uncertainty or diversity sampling driven by task-specific objectives.

This procedure yields initial labeled sets that are more stable, diverse, and representative of the data distribution compared to random or naive raw-feature clustering, especially when BB is small.

3. Clustering Algorithms and Selection Schemes

Several clustering and sampling pipelines predicated on foundation-model embeddings have been proposed (Yuan et al., 2024, Levy et al., 26 Jan 2026, Mannix et al., 2023):

Study Embedding Backbone Clustering Sampling Method
(Yuan et al., 2024) DenseNet-121, TXRV, CXRF, REMEDIS Incremental kk-means Greedy medoid aggregation
(Levy et al., 26 Jan 2026) ResNet-50 (RadImageNet) kk-means (kk auto-selected via silhouette) Medoid + farthest-point within cluster
(Mannix et al., 2023) SimCLR (ResNet-18) kk-medoids on t-SNE 2D embedding 1 medoid per cluster

All clusters operate in the embedding space and use either Euclidean or cosine distances. The proportional and intra-cluster diverse selection (e.g., farthest-point) maximize coverage and reduce sample redundancy.

4. Empirical Performance Across Domains

Quantitative evaluation on medical-image classification and segmentation (Yuan et al., 2024, Levy et al., 26 Jan 2026), and semi-supervised class discovery in vision (Mannix et al., 2023), demonstrates the superiority of foundation-embedding-based cold-start sampling:

  • Medical image segmentation: On CheXmask and Montgomery CXR, Dice improved by 1–2.2 percentage points and Hausdorff distance reduced by 4.75–4.8mm over random sampling (Levy et al., 26 Jan 2026).
  • Classification: TXRV-clustering achieves AUPRC 0.557±0.082 vs. random 0.389±0.094 and F1 0.524±0.071 vs. random 0.447±0.100 at B=20B=20 (Yuan et al., 2024).
  • Segmentation: TXRV-clustering DSC 0.244±0.031 vs. random 0.161±0.051 (Yuan et al., 2024).
  • Variance reduction: Far lower run-to-run variance versus random sampling, essential for reproducibility and robust AL initialization.

Consistent gains are seen in cold-start and low-data regimes, as well as improved downstream active learning curves.

5. Practical Guidelines and Implementation Recommendations

Recommendations distilled from empirical and methodological studies (Yuan et al., 2024, Levy et al., 26 Jan 2026):

  • Model selection: Use a foundation model pretrained as close to the downstream domain as possible.
  • Layer selection: Extract penultimate-layer (pre-logit) embeddings; apply Lâ‚‚ normalization.
  • Clustering parameters: Select kk automatically (e.g., silhouette score maximization). Budget-constrained selection may use incremental kk-means or kk-medoids.
  • Sampling allocation: Ensure proportional and diverse coverage by combining medoid selection and intra-cluster farthest-point augmentation as necessary.
  • Reproducibility: Fix random seeds; perform multiple runs to estimate variance; visualize embedding spaces and cluster assignments for sanity checking.

6. Limitations and Domain-Specific Considerations

Despite robust performance, foundation-model embedding approaches have several constraints:

  • Domain specificity: Foundations models pretrained on unrelated domains (e.g., ImageNet for medical scans) underperform compared to domain-specialist models (TXRV, CXRF) (Yuan et al., 2024).
  • Dimensionality reduction: Clustering in high-dimensional embedding spaces is often mitigated via visualization-driven t-SNE projections; however, for large NN, this can be computationally intensive (Levy et al., 26 Jan 2026, Mannix et al., 2023).
  • Scalability: Embedding extraction is O(Nâ‹…costfθ)O(N \cdot \text{cost}_{f_\theta}), with downstream clustering at O(NkT)O(NkT). In practice, these pipelines are tractable for datasets up to N∼105N\sim 10^5.

7. Extensions, Variants, and Future Research

Variants of the core pipeline include alternative clustering algorithms (e.g., spectral clustering, affinity propagation), other self-supervised embedding backbones (e.g., SimCLR, MoCo), and hybrid cold-start acquisition functions (combining clustering and model uncertainty) (Mannix et al., 2023). Future directions emphasized in the literature include:

  • Automated domain-adaptive selection of pretrained models.
  • End-to-end optimization of embeddings and cold-start selection functions.
  • Theoretical characterization of coverage and sample-efficiency guarantees for embedding-based initializations.

Foundation-model embeddings as a cold-start clustering substrate have become a standard and empirically validated approach for low-label and domain-specialized active learning pipelines across medical imaging, general vision, and beyond (Yuan et al., 2024, Levy et al., 26 Jan 2026, Mannix et al., 2023).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Foundation-Model Embeddings.