Papers
Topics
Authors
Recent
Search
2000 character limit reached

Arctanh-Based InfoNCE: Temperature-Free Contrastive Loss

Updated 12 December 2025
  • The paper introduces a temperature-free loss by replacing traditional temperature scaling with an arctanh transformation, simplifying contrastive learning.
  • The method yields robust, nonvanishing gradients that enable reliable optimization even in high-similarity regimes and diverse negative sample settings.
  • Empirical evaluations show that the Free loss outperforms or matches tuned InfoNCE across image, graph, anomaly detection, language debiasing, and recommendation benchmarks.

Arctanh-Based InfoNCE is a temperature-free alternative to the standard InfoNCE loss used in contrastive learning, proposed by Kim & Kim (2024). It replaces the conventional temperature scaling of similarity logits with a mathematically principled mapping based on the inverse hyperbolic tangent (arctanh), thereby eliminating the temperature hyperparameter and simplifying the optimization pipeline. This modification results in robust, non-vanishing gradients and outperforms or matches InfoNCE with carefully tuned temperatures across image, graph, anomaly detection, language debiasing, and sequential recommendation benchmarks (Kim et al., 29 Jan 2025).

1. Motivation and Background

Contrastive learning frameworks hinge on maximizing agreement between positive pairs (similar or augmented samples) while minimizing agreement with negative samples, commonly through the InfoNCE loss. Traditionally, InfoNCE employs a temperature parameter τ\tau to rescale cosine similarity scores:

Linf=Ex[logexp(sim(x,x+)/τ)x{x+,x}exp(sim(x,x)/τ)]L_\mathrm{inf} = -\mathbb{E}_x \left[\log \frac{\exp(\mathrm{sim}(x, x^+) / \tau )}{\sum_{x' \in \{x^+, x^-\}} \exp(\mathrm{sim}(x, x') / \tau )} \right]

where sim()\mathrm{sim}(\cdot) denotes the normalized dot product.

The temperature τ\tau is sensitive to architecture, batch size, data, and task. Incorrect τ\tau selection leads to slow convergence or vanishing gradients, necessitating costly grid searches. Arctanh-Based InfoNCE addresses this constraint by deploying an arctanh mapping, thereby removing the need for temperature calibration and the associated experimental overhead.

2. Mathematical Formulation

The essential innovation is mapping the bounded cosine similarity values u=cosθ(1,1)u = \cos\theta \in (-1, 1) onto the entire real line using a log-odds transformation:

h(u)=2arctanh(u)=log(1+u1u)h(u) = 2\,\mathrm{arctanh}(u) = \log \left(\frac{1+u}{1-u}\right)

This transformation is equivalent to applying the standard logit function to a rescaled cosθ\cos\theta, ensuring well-scaled unbounded logits for softmax without manual scaling.

The temperature-free Arctanh-Based InfoNCE loss for a positive and N1N-1 negatives is:

Larctanh=Ex[logexp(h(sim(x,x+)))x{x+,x}exp(h(sim(x,x)))]L_\mathrm{arctanh} = -\mathbb{E}_x \left[ \log\frac{\exp(h(\mathrm{sim}(x, x^+)))}{\sum_{x' \in \{x^+, x^-\}} \exp(h(\mathrm{sim}(x, x')))} \right]

For a single negative, the loss can be equivalently written in pairwise sigmoid form:

Lpair=Ex[logσ(h(s+)h(s))]L_\mathrm{pair} = -\mathbb{E}_x [\, \log \sigma (h(s^+) - h(s^-))\,]

where s+,ss^+, s^- are positive and negative similarities, and σ()\sigma(\cdot) is the sigmoid.

A closed-form version for the softmax denominator under convenient symmetry yields:

Li=log(1+C)2(1+C)2+(N1)(1C)2L_i = -\log \frac{(1+C)^2}{(1+C)^2 + (N-1)(1-C)^2}

where C=(cosθii+cosθij)/2C = (\cos\theta_{ii^+} - \cos\theta_{ij^-}) / 2.

3. Gradient Properties and Theoretical Analysis

The temperature-scaled InfoNCE loss suffers from challenges in gradient behavior. At large C1C\rightarrow 1 (high similarity), gradients remain nonvanishing for large τ\tau (risking over-shooting), and become negligible for moderate CC with small τ\tau (risking stagnation).

Arctanh-Based InfoNCE, in contrast, guarantees:

  • Gradients that vanish only at the true optimum (C1C\rightarrow 1), ensuring unambiguous convergence.
  • Nonzero, smoothly decaying gradients elsewhere, avoiding dead zones in optimization regardless of the number of negatives NN.

Explicitly, for the closed-form loss with respect to CC:

LiC=4(N1)(1C)(1+C)[N(1C)2+4C]\left| \frac{\partial L_i}{\partial C} \right| = \frac{4(N-1)(1-C)}{(1+C)\left[ N(1-C)^2 + 4C \right]}

which behaves favorably for all C(1,1)C \in (-1,1) and NN.

In the pairwise setting, per-embedding gradients are expressed via

Ls+=(1p)h(s+),Ls=+ph(s)\frac{\partial L}{\partial s^+} = -(1-p) h'(s^+), \qquad \frac{\partial L}{\partial s^-} = +p h'(s^-)

with h(u)=2/(1u2)h'(u) = 2/(1-u^2), which is finite for all uu away from the boundaries.

4. Implementation and Deployment

The algorithmic recipe is a minimal adaptation of standard contrastive pipelines, demonstrated via PyTorch pseudocode. For a batch size BB and two augmented views, one computes pairwise cosine similarity, applies the 2arctanh2\,\mathrm{arctanh} mapping (or log((1+u)/(1u))\log((1+u)/(1-u))), and feeds the resulting logits to cross-entropy. The core changes are:

  • Removal of τ\tau hyperparameter.
  • Application of arctanh-log-odds mapping to similarities.
  • Softmax-based classification of positive pairs remains intact.

This approach retains the original InfoNCE code structure and complexity, facilitating seamless integration into existing frameworks (Kim et al., 29 Jan 2025).

5. Empirical Evaluation Across Benchmarks

Kim & Kim provide empirical results across five representative domains, demonstrating consistent advantages of the Arctanh-Based InfoNCE loss—termed "Free"—without any temperature search:

Task Dataset/Config Best InfoNCE Free (Ours)
Image Classification Imagenette/ResNet-18 84.43% (k-NN) 84.65%
Graph Representation CiteSeer/GRACE/GCN 67.33% (F1) 67.95%
Anomaly Detection CIFAR-10/MSC/ResNet-152 97.215 (AUC) 97.279
Language Debiasing StereoSet/BERT-base 80.6 (LM) 81.0
Sequential Recommendation MovieLens-20M/DCRec 0.1336 (HR@1) 0.1360

These results indicate that the Free loss consistently matches or exceeds the best tuned temperature-based losses, and outperforms InfoNCE on a majority of tasks, metrics, and datasets assessed (Kim et al., 29 Jan 2025).

6. Practical Considerations and Limitations

Key practical implications include:

  • Elimination of brittle trial-and-error tuning of τ\tau, reducing experimental complexity.
  • Robust, non-vanishing gradients across all similarity regimes, independent of negative sample count or batch size.
  • Implementation requires at most a few lines of modification in existing codebases.

Limitations and caveats include:

  • The 2arctanh(u)2\,\mathrm{arctanh}(u) mapping amplifies numerical noise near u±1u\rightarrow\pm1; in practice, inputs are clamped to u<1ϵ|u| < 1-\epsilon.
  • Initial training stages may exhibit amplified gradient noise; warmup schedules or clipping may be beneficial.
  • Theoretical analysis is as yet restricted to unit-norm embeddings using cosine similarity and the single-positive contrastive regime; extension to other scenarios is an open direction.

7. Extensions and Future Research Directions

Kim & Kim suggest several avenues for further exploration:

  • Learnable or adaptive scaling factors layered atop the arctanh transformation for dynamic modulation of gradient magnitude.
  • Application to large-scale multi-modal systems (e.g., CLIP) and alternative contrastive learning protocols (MoCo, SimSiam).
  • Analysis of the transformation’s influence on representation geometry and transfer learning downstream.
  • Adaptation to multiple positives per anchor and hierarchical or structured contrastive settings.

A plausible implication is that removing hyperparameter sensitivity may not only simplify training, but also facilitate broader adoption and more reliable deployment of contrastive learning across modalities and architectures (Kim et al., 29 Jan 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Arctanh-Based InfoNCE.