Similarity-Regulated Contrastive Learning

Updated 18 February 2026

Similarity-Regulated Contrastive Learning (SRCL) is a family of methods that use continuous and structured similarity measures instead of binary labels to regulate contrastive loss.
SRCL employs heterogeneous similarity metrics—such as t-vMF, soft labels, and uncertainty weighting—to dynamically adjust margins and optimize mutual information between samples.
Empirical studies show that SRCL improves intra-class clustering, robustness to distribution shifts, and out-of-domain generalization compared to traditional approaches.

Similarity-Regulated Contrastive Learning (SRCL) refers to a family of methods that generalize the classical contrastive learning paradigm by leveraging graded or heterogeneous measures of similarity between samples to construct the loss. Unlike standard contrastive objectives that dichotomize sample relations into hard positives and negatives, SRCL allows for continuous, structured, or multi-way similarity input—explicitly regulating the learning dynamics according to richer prior or data-driven notions of sample affinity. This class of approaches emerges both in supervised settings, where label-derived or expert-chosen similarity metrics are known, and in self-supervised or multi-modal/self-supervised settings, where similarities may be estimated or inferred from the data, augmentations, or auxiliary models. SRCL methods have shown strong empirical advantages in robustness to distribution shift, improved intra-class/semantic clustering, out-of-domain generalization, and principled handling of partial positives and false negatives.

1. Theoretical Foundations and Motivation

Classical supervised contrastive learning (SupCon) and InfoNCE-based self-supervision operate under a binary positive–negative view: given an anchor, positives (augmentations or same-class) are maximized, and the rest of the batch is treated as equally repulsive negatives. This formulation underpins models from SimCLR and MoCo to ALBEF and CLIP, and is tightly linked to boosting a mutual information (MI) lower bound between anchors and their positives. However, this binary labeling fails in several key regimes:

Semantic similarities or group structure exist within nominal "negatives," leading to suboptimal repulsion of allied instances (e.g., partial false negatives, revision of batch negatives under class or caption overlap) (Jiang et al., 2023).
Overfitting may arise to majority-group or in-domain regularities, degrading performance under subpopulation or domain shift, because uniform separation does not enforce robust clustering of rare or outlier subgroups (Kutsuna, 2023).
Multi-label, multi-metric, or multi-modal settings require encoding variable, multiple, or even uncertain definitions of similarity in one embedding space (Mu et al., 2023).

SRCL formally generalizes the contrastive objective by weighting sample–sample relations in the loss according to some function $w^*_{ij}$ , which may reflect label structure, learned or frozen similarity models, or external side information such as text or augmentation strength. In practice, SRCL can be viewed as maximizing a more nuanced information-theoretic quantity—frequently a difference of MI terms, or an uncertainty-weighted sum, rather than maximizing an MI lower bound restricted to gold positives.

2. Core Methodologies in SRCL

A representative spectrum of SRCL methodologies includes:

a) Heterogeneous Similarity in Supervised Losses

The SRCL method in "Supervised Contrastive Learning with Heterogeneous Similarity for Distribution Shifts" (Kutsuna, 2023) introduces positive- and negative-specific similarity metrics, parameterized as t-vMF similarities $\varphi_\kappa(\cdot,\cdot)$ with different shape parameters $(\kappa_p, \kappa_n)$ . This induces a variable angular margin, enforcing that negative–anchor angles must exceed positive–anchor angles by at least $\epsilon>0$ . By varying the reparameterization $\alpha$ controlling $\kappa_p,\kappa_n$ , a data-dependent regularization margin is induced that tightly clusters positives and robustly separates negatives.

b) Soft and Weighted Contrastive Losses

SCE ("Similarity Contrastive Estimation") and X-Sample Contrastive Loss (XSCRL) exemplify losses where the one-hot labeling of positives is replaced by a continuous distribution $w_{ij}$ over all positives and negatives, informed by semantic similarity, augmentations, or text graphs (Denize et al., 2022, Sobal et al., 2024, Denize et al., 2021). The loss is typically the cross-entropy between this target similarity distribution ("soft labels") and the model's predicted similarities, often leveraging a momentum or slowly-updated encoder to provide stable similarity targets. These losses interpolate smoothly between classic InfoNCE and relational losses such as ReSSL.

c) Weighting and Multi-Similarity Regularization

In Multi-Similarity Contrastive (MSCon) (Mu et al., 2023), embeddings are optimized according to multiple, task-specific similarity metrics, each regulated by an automatically learned uncertainty-weight $\sigma_c^2$ . Loss terms for uncertain metrics are downweighted, yielding improved robustness when some labels are noisy, and better out-of-domain generalization to held-out similarity tasks.

d) Similarity-Aware Regularization in Recommendations and Market Forecasting

Relative Contrastive Learning (RCL) for sequential recommendation (Wang et al., 27 Apr 2025) and WSSCL for market forecasting (Vinden et al., 22 Feb 2025) implement dual- or multi-tiered similarity regulation. RCL uses both "strong" positives (same target) and "weak" positives (structurally or semantically similar but not identical sequences), weighting their influence to avoid dominance by spurious pseudo-positives. ContraSim (Vinden et al., 22 Feb 2025) uses continuous similarity weights from fine-grained data-augmentation actions, yielding an embedding where semantic gradient is encoded between anchor and augmentation.

Below is a table summarizing the key design axes for several canonical SRCL methods:

Method	Type of Similarity Regulation	Main Use Case
Heterogeneous SupCon (Kutsuna, 2023)	Positive/negative-specific similarity via $\kappa_p$ , $\kappa_n$	OOD/generalization under shift
SCE / XSCRL (Denize et al., 2022, Sobal et al., 2024)	Soft, distributional similarity (momentum-encoder, text graph)	Vision SSL, transfer, retrieval
MSCon (Mu et al., 2023)	Multiple metric-weighted losses, uncertainty-adapted	Multi-metric, multi-label domains
RCL (Wang et al., 27 Apr 2025)	Dual-tier (strong/weak) positives, similarity-weighted InfoNCE	Sequential recommendation
ContraSim (Vinden et al., 22 Feb 2025)	Weighted pairwise with continuous, external similarity	Financial time series, clustering

3. Analytical Properties and Regularization Effects

SRCL methods induce a variety of regularization effects, all rooted in the replacement of hard binary similarity with regulated, graded, or multi-headed similarity signals:

Implicit Data-Dependent Margin: Heterogeneous similarity (e.g., $\kappa_p > \kappa_n$ or margin in t-vMF) introduces an angular margin in representation space, acting as an implicit (but data-adaptive) regularizer that is especially effective under distribution shift (Kutsuna, 2023).
Robust Cluster Maintenance: Weighted and soft losses avoid the manifold collapse and class collision that plagues hard negative suppression, preserving semantically meaningful intra-class and inter-class relations, evidenced by downstream clustering and k-NN metrics (Denize et al., 2022, Sobal et al., 2024).
Improved OOD/Minority Performance: Substantial gains are observed on minority-group accuracy in subpopulation shifts and on out-of-domain test sets, with SRCL outperforming fixed-margin regularizers (ArcFace, LDAM), weight decay, and strong augmentation (Kutsuna, 2023, Mu et al., 2023).
Dynamic Similarity Regulation: MSCon's uncertainty-based weight updates and SCE's $\lambda$ -weighted interpolation dynamically respond to task noise or uncertainty, down-weighting unreliable similarity inputs without collapsing to a single notion (Mu et al., 2023, Denize et al., 2022).
Efficient Mutual Information Allocation: In cross-modal or noisy-data settings, SRCL generalizes InfoNCE's MI optimization by throttling the MI reduction for putative negatives that are semantically close, thus retaining useful structural information (Jiang et al., 2023).

4. Implementation Strategies and Design Considerations

Similarity Function Choice: t-vMF similarities, cosine, dot product, or other (possibly learned) metrics are common. The flexibility in SRCL frameworks allows varying the sharpness, normalization, and context of the similarity computation for positives, negatives, or a full similarity graph (Kutsuna, 2023, Sobal et al., 2024).
Auxiliary/Reference Models: Momentum encoders or frozen pre-trained models are often used to compute more stable similarity distributions for soft-labeled losses, especially during early training or in noisy settings (Denize et al., 2022, Jiang et al., 2023).
Batch/Memory Construction: Large memory queues (e.g., MoCo with size 65,536) support more comprehensive similarity computation across instances. Dynamic update rules ensure the batch or queue reflects the evolving model (Kutsuna, 2023, Denize et al., 2022).
Sampling Mechanisms: SRCL methods employing multi-level positives or similarity graphs must construct and update the positive/negative pools per anchor depending on external metrics or batch statistics (e.g., sampling weak positives according to similarity for RCL) (Wang et al., 27 Apr 2025).
Loss Parameterization: Hyperparameters (e.g., $\alpha$ for margin growth, $\lambda$ for InfoNCE vs. relational balance, temperature $\tau$ , uncertainty weights $\sigma_c$ ) require tuning but offer control over the spectrum of regularization and discrimination (Mu et al., 2023, Denize et al., 2022).
Optimization and Scheduling: Standard optimizers (Adam, SGD, LARS) are used with tailored scheduling (e.g., cosine decay, batch-level group re-sampling for group-DRO), fitting the base model (ResNet, ViT, GNN, recommendation backbone) and target task (Kutsuna, 2023, Mu et al., 2023).

5. Empirical Results and Generalization Benchmarks

SRCL methods yield consistent gains on both classic and challenging benchmarks:

Distribution Shift Robustness: On CelebA (subpopulation shift), SRCL achieves worst-group accuracy of $86.6\%$ with $\alpha=0.1$ , outperforming strong baselines (SupCon: $74.9\%$ , robust-DRO: $54.4\%$ ) (Kutsuna, 2023).
Domain Generalization: On Camelyon17-WILDS, SRCL improves OOD accuracy to $80.2\%$ and $63.4\%$ (vs. $72.4\%$ , $56.9\%$ for SupCon), and enhances zero-shot vision–language transfer (Jiang et al., 2023, Kutsuna, 2023).
Self-Supervised and Relational Gains: XSCRL surpasses SimCLR and CLIP in both in-domain and OOD benchmarks, particularly in low-data regimes and on foreground–background disentanglement tasks (e.g., ImageNet-9, MIT-States) (Sobal et al., 2024).
Multi-Similarity and Task Noise: MSCon maintains high accuracy in multi-metric regimes and down-weights noisy metrics, outperforming multitask cross-entropy and single-metric contrastive learning on both Zappos and MEDIC datasets (Mu et al., 2023).
Recommendation and Finance: RCL consistently outperforms augmentation-based and classic SCL approaches, with up to $6\%$ absolute improvement on HR@5 and NDCG@10 metrics; ContraSim boosts market headline classification accuracy and yields embeddings where clusters correspond with market movement without explicit supervision (Wang et al., 27 Apr 2025, Vinden et al., 22 Feb 2025).

6. Limitations, Open Questions, and Future Directions

Hyperparameter Sensitivity: The performance of SRCL often peaks at intermediate values of the regulation parameter ( $\alpha$ , $\lambda$ ); overly aggressive margins or label smoothing can hurt in-domain accuracy or collapse representation granularity (Kutsuna, 2023, Denize et al., 2022).
Computational Considerations: For some implementations, constructing large similarity graphs or sample-pairwise weighting may entail increased memory or pre-processing overhead, although empirical studies confirm minimal runtime impact in practice (Sobal et al., 2024).
Scope of Applicability: When no substantial OOD distributional shift or group imbalance is present, SRCL provides little or no gain over classic contrastive learning (Kutsuna, 2023). Similarly, fully self-supervised extensions and the optimal design of similarity kernels remain open areas (Denize et al., 2022, Kutsuna, 2023).
Semantics of Similarity Assignments: How to aggregate or select among competing similarity judgments (labels, external encoders, learned metrics, augmentation graphs) in systematic, data-driven ways is not fully resolved. Learning $w^*_{ij}$ online, fusing multi-modal similarity, and adapting regulation at the level of instance or cluster are key research directions (Sobal et al., 2024, Jiang et al., 2023, Mu et al., 2023).

7. Relation to Broader Contrastive and Similarity-Based Paradigms

SRCL generalizes both contrastive and relational learning approaches. Whereas InfoNCE and SupCon maximize MI via hard discrimination, and relational methods maximize alignment to a continuous soft graph, SRCL provides a spectrum, interpolating between pure discrimination and pure relational matching. In cross-modal, multi-label, and group-robust learning, SRCL leverages the full information content of sample–sample relations, enabling precise, context-sensitive regularization and improved generalization.

A plausible implication is that, as foundation models become increasingly multi-modal and required to generalize under heterogeneous or shifting distributions, similarity-regulated losses will become critical components of robust, interpretable, and data-efficient representation learning (Jiang et al., 2023, Sobal et al., 2024, Mu et al., 2023, Kutsuna, 2023).