SCPL: Supervised Contrastive Prototype Learning

Updated 21 January 2026

SCPL is a method that integrates prototype-based geometry with supervised contrastive learning to provide explicit geometric control and enhanced feature clustering.
It uses fixed or learnable class prototypes as anchors in the embedding space, improving robustness and performance in imbalanced, noisy, and OOD conditions.
The unified SCPL framework extends naturally to semi-supervised and hierarchical setups, yielding greater sample efficiency and more interpretable representations.

Supervised Contrastive Prototype Learning (SCPL) is a training paradigm that integrates prototype-based geometry into supervised contrastive learning, providing explicit geometric control, improved robustness, sample efficiency, and principled equivalence to cross-entropy under certain conditions. SCPL achieves this by introducing class prototypes—fixed or learnable embeddings representing each class—into the contrastive objective. These prototypes act as anchors, alignment targets, and (potentially) alternative classifiers in the learned representation space. The result is a unified framework that subsumes traditional supervised contrastive loss, extends naturally to semi-supervised and hierarchical label regimes, and empirically outperforms standard approaches, especially in the presence of imbalance, label noise, adversarial examples, and out-of-distribution (OOD) data (Gill et al., 2023, Fostiropoulos et al., 2022, Lian et al., 2024, Gauffre et al., 2024, Li et al., 2024, Jeong et al., 11 Jun 2025).

1. Theoretical Foundations and Motivation

Standard supervised contrastive learning (SCL) aims to group samples of the same class by minimizing a temperature-scaled InfoNCE loss over pairs, encouraging intra-class compactness and inter-class separation in the feature embedding space. However, SCL lacks explicit control over the resulting geometric structure, and conventional cross-entropy (CE) classifiers constrain embedding flexibility by reducing the final layer to class logits, often of dimension $K \ll D$ , where $D$ is the original feature dimension.

SCPL addresses these limitations by introducing prototypes $\{p_c\}_{c=1}^K \subset \mathbb{R}^D$ —either fixed or learnable—as class-level anchors in the embedding space. The training objective explicitly pulls examples toward their assigned class prototype and pushes them away from others, either by treating prototypes as additional positives/negatives in the batch or directly using instance–center contrastive losses. This allows for engineered geometry (via fixed prototypes) or adaptive clustering (via learned prototypes), increasing representational flexibility, geometric control, and robustness (Gill et al., 2023, Fostiropoulos et al., 2022).

Theoretically, SCPL interpolates between SCL and CE. In the limit of infinitely many prototypes per class, the SCPL loss reduces to CE with a fixed, $\ell_2$ -normalized classifier, plus an alignment regularizer. This establishes equivalence and enables geometric engineering in a manner inaccessible to standard contrastive or CE losses (Gill et al., 2023, Gauffre et al., 2024, Jeong et al., 11 Jun 2025).

2. Formal SCPL Frameworks and Loss Functions

2.1. Basic SCPL with Fixed or Learned Prototypes

The SCPL mini-batch loss augments a data batch $B$ with a set of prototypes (copies for each class), denoted $\widetilde{B} = B \cup \{p_{1},...,p_{k}\}_{n_w}$ . The per-batch SCPL objective is

$\mathcal{L}_{\mathrm{SCPL}}(B) = \sum_{i \in B} \frac{1}{n_{B,y_i} + n_w - 1} \sum_{\substack{z \in \widetilde{B} \ z \neq h_i, \, \mathrm{label}(z) = y_i}} -\log\frac{\exp(h_i^\top z / \tau)}{\sum_{\ell \in \widetilde{B}, \ell \ne i}\exp(h_i^\top \ell / \tau)}$

where $h_i$ is a normalized feature embedding, $p_c$ the (fixed or learned) prototype for class $c$ , and $D$ 0 the number of prototype copies (Gill et al., 2023). Prototypes are treated as extra (true) positives for their class and negatives otherwise. In the limiting case $D$ 1, this loss converges to CE with a normalized, fixed classifier and an explicit alignment term.

2.2. Instance–Center (Prototype) Contrastive Loss

Several SCPL variants use explicit instance–prototype contrastive objectives. Given feature embedding $D$ 2 and class prototype $D$ 3, the instance–center loss is

$D$ 4

where $D$ 5 can modulate negative penalties—for example, by encoding label hierarchy similarities (Lian et al., 2024). Prototypes $D$ 6 are learned, either initialized randomly or from semantic class descriptors, and updated via gradient descent.

2.3. Hybrid and Generalized Losses

SCPL also subsumes forms such as ProjNCE (Jeong et al., 11 Jun 2025), which generalizes InfoNCE and SupCon by projecting features and classes via $D$ 7 and aggregating both self-projection and negative-pair adjustment terms. In text classification and imbalanced settings, hybrid objectives combine compensated CE, prototype-based contrastive loss, and rebalanced sample mining strategies (e.g., simple-sampling and hard-mixup) (Li et al., 2024).

3. Geometric Engineering and Interpretability

SCPL enables explicit engineering of feature space geometry. By fixing prototype positions—e.g., to an equiangular tight frame (ETF)—one can force the learned class means to align with a desired geometry, including the ideal simplex-ETF associated with Neural Collapse (Gill et al., 2023). The $D$ 8 Gram matrix $D$ 9 (for prototypes) encodes target inter-class angles, and the batch mean matrix $\{p_c\}_{c=1}^K \subset \mathbb{R}^D$ 0 of class features empirically converges toward $\{p_c\}_{c=1}^K \subset \mathbb{R}^D$ 1 as prototype strength increases.

Custom, non-symmetric prototype configurations can boost minority–minority class separation or collapse specific class clusters for long-tail scenarios. The SCPL loss steers the feature means toward these engineered angles, providing control over intra- and inter-class margins not achievable with softmax or vanilla SCL (Gill et al., 2023).

In learned prototype regimes, prototypes evolve as centroids of their respective classes. Their use as direct 1-nearest-neighbor classifiers (i.e., $\{p_c\}_{c=1}^K \subset \mathbb{R}^D$ 2) allows interpretable and efficient inference, especially in few-shot and hierarchical settings (Lian et al., 2024).

4. Practical Algorithmic Design and Sample Efficiency

SCPL inserts prototypes into the loss computation without restructuring the main architecture. For vision tasks, prototypes can be fixed vectors appended to each batch; for NLP, prototype vectors are often derived from linear classifier weights projected into contrastive space and jointly optimized (Gill et al., 2023, Li et al., 2024).

Key algorithmic elements:

No batch hard mining is needed—each sample's contrast only involves its nearest class and negative-class prototypes (Fostiropoulos et al., 2022).
SCPL is modular: it replaces the standard classification head with a prototype set and associated loss; any augmentations or additional modality-specific heads can be retained.
Prototypes can be fixed for geometric enforcement, learned per class and updated by backpropagation, or computed dynamically as batch, running, or robust (e.g., median) centroids (Jeong et al., 11 Jun 2025).
Hard-mixup and simple-sampling strategies can rebalance minority classes, improve hard example coverage, and further regularize the loss (Li et al., 2024).

SCPL maintains computational scalability with $\{p_c\}_{c=1}^K \subset \mathbb{R}^D$ 3 inner products, is sample efficient by leveraging prototypes as stable anchors, and is robust to high-variance augmentation configurations (Fostiropoulos et al., 2022, Li et al., 2024).

5. Empirical Properties and Performance

SCPL consistently yields gains in adversarial robustness, OOD detection, and minority-class generalization. It outperforms conventional CE, SCL, and several hybrid methods under both clean and noisy conditions across vision, text, and semi-supervised learning (Gill et al., 2023, Fostiropoulos et al., 2022, Lian et al., 2024, Jeong et al., 11 Jun 2025, Li et al., 2024, Gauffre et al., 2024).

In vision experiments, SCPL with $\{p_c\}_{c=1}^K \subset \mathbb{R}^D$ 4 prototype copies recovers ideal ETF geometry and achieves superior accuracy and class separation under dataset imbalance (CIFAR-10, ResNet-18). In text classification, networks with prototype-guided SCPL achieve higher macro-F1 and accuracy on imbalanced datasets than state-of-the-art LLMs and SCL-based approaches, particularly when employing logit priors, balanced hard-mixup, and calibrated prototype heads (Li et al., 2024). OOD detection error and FPR@95 for SVHN $\{p_c\}_{c=1}^K \subset \mathbb{R}^D$ 5 LSUN/ImageNet are significantly reduced compared to standard heads (Fostiropoulos et al., 2022).

Key ablations demonstrate that disabling prototypes or the SCPL branch causes a 2–5 % drop in core metrics, and in few-shot learning, intra-cluster compactness and inter-cluster separation are consistently improved versus vanilla SCL (Lian et al., 2024).

6. Hyperparameters, Best Practices, and Pitfalls

Optimal SCPL configurations are highly task-dependent:

Temperature ( $\{p_c\}_{c=1}^K \subset \mathbb{R}^D$ 6): Lower values focus on hard negatives but risk amplifying noise; higher values smooth inter-class similarities, possibly underemphasizing boundary cases. Effective ranges include $\{p_c\}_{c=1}^K \subset \mathbb{R}^D$ 7 for vision and $\{p_c\}_{c=1}^K \subset \mathbb{R}^D$ 8 for NLP (Jeong et al., 11 Jun 2025, Li et al., 2024).
Prototype update: Fixed prototypes enforce controlled geometry; learned prototypes provide adaptive centroids, robust to label noise and feature variance (Gill et al., 2023, Jeong et al., 11 Jun 2025).
Sample size and batch effects: Large batches stabilize centroid estimates and improve MI bounds; very small batches impair geometric consistency (Jeong et al., 11 Jun 2025).
Prototype regularization: Adding norm penalties ( $\{p_c\}_{c=1}^K \subset \mathbb{R}^D$ 9) benefits robustness; excessive regularization hinders within-class adaptation (Fostiropoulos et al., 2022).
Balanced reweighting: Logit prior compensation and per-class balanced sampling are essential for long-tail recognition (Li et al., 2024).

Potential pitfalls include instability with overly large prototype sets, insufficient batch normalization for prototype computation, and poor alignment under highly misspecified geometry (e.g., inappropriate fixed ETF under structured data).

7. Extensions and Research Directions

Recent work generalizes SCPL along label hierarchy (LASCL), where class similarity matrices $\ell_2$ 0 enable hierarchical margin control and improved clustering for fine/coarse label regimes (Lian et al., 2024). Unified frameworks extend SCPL to semi-supervised learning, leveraging prototypes for both pseudo-labeling and anchor selection within a single contrastive loss, enabling better utilization of unlabeled data and efficient self-training (Gauffre et al., 2024).

Alternative prototype computation methods—kernel (Nadaraya–Watson), robust medians, teacher models, or warm-start clustering—yield improved performance in settings with label or feature corruption (Jeong et al., 11 Jun 2025). SCPL is also compatible with modern data augmentations and self-supervised methods.

The geometric, robust, and sample-efficient characteristics of SCPL position it as a central component of current and future contrastive representation learning, with cross-domain applicability and direct implications for understanding and controlling deep feature geometry.

References:

"Engineering the Neural Collapse Geometry of Supervised-Contrastive Loss" (Gill et al., 2023)
"Supervised Contrastive Prototype Learning: Augmentation Free Robust Neural Network" (Fostiropoulos et al., 2022)
"Learning Label Hierarchy with Supervised Contrastive Learning" (Lian et al., 2024)
"A Unified Contrastive Loss for Self-Training" (Gauffre et al., 2024)
"Simple-Sampling and Hard-Mixup with Prototypes to Rebalance Contrastive Learning for Text Classification" (Li et al., 2024)
"Generalizing Supervised Contrastive learning: A Projection Perspective" (Jeong et al., 11 Jun 2025)