Arctanh-Based InfoNCE: Temperature-Free Contrastive Loss

Updated 12 December 2025

The paper introduces a temperature-free loss by replacing traditional temperature scaling with an arctanh transformation, simplifying contrastive learning.
The method yields robust, nonvanishing gradients that enable reliable optimization even in high-similarity regimes and diverse negative sample settings.
Empirical evaluations show that the Free loss outperforms or matches tuned InfoNCE across image, graph, anomaly detection, language debiasing, and recommendation benchmarks.

Arctanh-Based InfoNCE is a temperature-free alternative to the standard InfoNCE loss used in contrastive learning, proposed by Kim & Kim (2024). It replaces the conventional temperature scaling of similarity logits with a mathematically principled mapping based on the inverse hyperbolic tangent (arctanh), thereby eliminating the temperature hyperparameter and simplifying the optimization pipeline. This modification results in robust, non-vanishing gradients and outperforms or matches InfoNCE with carefully tuned temperatures across image, graph, anomaly detection, language debiasing, and sequential recommendation benchmarks (Kim et al., 29 Jan 2025).

1. Motivation and Background

Contrastive learning frameworks hinge on maximizing agreement between positive pairs (similar or augmented samples) while minimizing agreement with negative samples, commonly through the InfoNCE loss. Traditionally, InfoNCE employs a temperature parameter $\tau$ to rescale cosine similarity scores:

$L_\mathrm{inf} = -\mathbb{E}_x \left[\log \frac{\exp(\mathrm{sim}(x, x^+) / \tau )}{\sum_{x' \in \{x^+, x^-\}} \exp(\mathrm{sim}(x, x') / \tau )} \right]$

where $\mathrm{sim}(\cdot)$ denotes the normalized dot product.

The temperature $\tau$ is sensitive to architecture, batch size, data, and task. Incorrect $\tau$ selection leads to slow convergence or vanishing gradients, necessitating costly grid searches. Arctanh-Based InfoNCE addresses this constraint by deploying an arctanh mapping, thereby removing the need for temperature calibration and the associated experimental overhead.

2. Mathematical Formulation

The essential innovation is mapping the bounded cosine similarity values $u = \cos\theta \in (-1, 1)$ onto the entire real line using a log-odds transformation:

$h(u) = 2\,\mathrm{arctanh}(u) = \log \left(\frac{1+u}{1-u}\right)$

This transformation is equivalent to applying the standard logit function to a rescaled $\cos\theta$ , ensuring well-scaled unbounded logits for softmax without manual scaling.

The temperature-free Arctanh-Based InfoNCE loss for a positive and $N-1$ negatives is:

$L_\mathrm{arctanh} = -\mathbb{E}_x \left[ \log\frac{\exp(h(\mathrm{sim}(x, x^+)))}{\sum_{x' \in \{x^+, x^-\}} \exp(h(\mathrm{sim}(x, x')))} \right]$

For a single negative, the loss can be equivalently written in pairwise sigmoid form:

$L_\mathrm{pair} = -\mathbb{E}_x [\, \log \sigma (h(s^+) - h(s^-))\,]$

where $s^+, s^-$ are positive and negative similarities, and $\sigma(\cdot)$ is the sigmoid.

A closed-form version for the softmax denominator under convenient symmetry yields:

$L_i = -\log \frac{(1+C)^2}{(1+C)^2 + (N-1)(1-C)^2}$

where $C = (\cos\theta_{ii^+} - \cos\theta_{ij^-}) / 2$ .

3. Gradient Properties and Theoretical Analysis

The temperature-scaled InfoNCE loss suffers from challenges in gradient behavior. At large $C\rightarrow 1$ (high similarity), gradients remain nonvanishing for large $\tau$ (risking over-shooting), and become negligible for moderate $C$ with small $\tau$ (risking stagnation).

Arctanh-Based InfoNCE, in contrast, guarantees:

Gradients that vanish only at the true optimum ( $C\rightarrow 1$ ), ensuring unambiguous convergence.
Nonzero, smoothly decaying gradients elsewhere, avoiding dead zones in optimization regardless of the number of negatives $N$ .

Explicitly, for the closed-form loss with respect to $C$ :

$\left| \frac{\partial L_i}{\partial C} \right| = \frac{4(N-1)(1-C)}{(1+C)\left[ N(1-C)^2 + 4C \right]}$

which behaves favorably for all $C \in (-1,1)$ and $N$ .

In the pairwise setting, per-embedding gradients are expressed via

$\frac{\partial L}{\partial s^+} = -(1-p) h'(s^+), \qquad \frac{\partial L}{\partial s^-} = +p h'(s^-)$

with $h'(u) = 2/(1-u^2)$ , which is finite for all $u$ away from the boundaries.

4. Implementation and Deployment

The algorithmic recipe is a minimal adaptation of standard contrastive pipelines, demonstrated via PyTorch pseudocode. For a batch size $B$ and two augmented views, one computes pairwise cosine similarity, applies the $2\,\mathrm{arctanh}$ mapping (or $\log((1+u)/(1-u))$ ), and feeds the resulting logits to cross-entropy. The core changes are:

Removal of $\tau$ hyperparameter.
Application of arctanh-log-odds mapping to similarities.
Softmax-based classification of positive pairs remains intact.

This approach retains the original InfoNCE code structure and complexity, facilitating seamless integration into existing frameworks (Kim et al., 29 Jan 2025).

5. Empirical Evaluation Across Benchmarks

Kim & Kim provide empirical results across five representative domains, demonstrating consistent advantages of the Arctanh-Based InfoNCE loss—termed "Free"—without any temperature search:

Task	Dataset/Config	Best InfoNCE	Free (Ours)
Image Classification	Imagenette/ResNet-18	84.43% (k-NN)	84.65%
Graph Representation	CiteSeer/GRACE/GCN	67.33% (F1)	67.95%
Anomaly Detection	CIFAR-10/MSC/ResNet-152	97.215 (AUC)	97.279
Language Debiasing	StereoSet/BERT-base	80.6 (LM)	81.0
Sequential Recommendation	MovieLens-20M/DCRec	0.1336 (HR@1)	0.1360

These results indicate that the Free loss consistently matches or exceeds the best tuned temperature-based losses, and outperforms InfoNCE on a majority of tasks, metrics, and datasets assessed (Kim et al., 29 Jan 2025).

6. Practical Considerations and Limitations

Key practical implications include:

Elimination of brittle trial-and-error tuning of $\tau$ , reducing experimental complexity.
Robust, non-vanishing gradients across all similarity regimes, independent of negative sample count or batch size.
Implementation requires at most a few lines of modification in existing codebases.

Limitations and caveats include:

The $2\,\mathrm{arctanh}(u)$ mapping amplifies numerical noise near $u\rightarrow\pm1$ ; in practice, inputs are clamped to $|u| < 1-\epsilon$ .
Initial training stages may exhibit amplified gradient noise; warmup schedules or clipping may be beneficial.
Theoretical analysis is as yet restricted to unit-norm embeddings using cosine similarity and the single-positive contrastive regime; extension to other scenarios is an open direction.

7. Extensions and Future Research Directions

Kim & Kim suggest several avenues for further exploration:

Learnable or adaptive scaling factors layered atop the arctanh transformation for dynamic modulation of gradient magnitude.
Application to large-scale multi-modal systems (e.g., CLIP) and alternative contrastive learning protocols (MoCo, SimSiam).
Analysis of the transformation’s influence on representation geometry and transfer learning downstream.
Adaptation to multiple positives per anchor and hierarchical or structured contrastive settings.

A plausible implication is that removing hyperparameter sensitivity may not only simplify training, but also facilitate broader adoption and more reliable deployment of contrastive learning across modalities and architectures (Kim et al., 29 Jan 2025).

Markdown Report Issue Upgrade to Chat

References (1)

Temperature-Free Loss Function for Contrastive Learning (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Arctanh-Based InfoNCE.

Arctanh-Based InfoNCE: Temperature-Free Contrastive Loss

1. Motivation and Background

2. Mathematical Formulation

3. Gradient Properties and Theoretical Analysis

4. Implementation and Deployment

5. Empirical Evaluation Across Benchmarks

6. Practical Considerations and Limitations

7. Extensions and Future Research Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Arctanh-Based InfoNCE: Temperature-Free Contrastive Loss

1. Motivation and Background

2. Mathematical Formulation

3. Gradient Properties and Theoretical Analysis

4. Implementation and Deployment

5. Empirical Evaluation Across Benchmarks

6. Practical Considerations and Limitations

7. Extensions and Future Research Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research