Debiased InfoNCE for Robust Mutual Information Estimation

Updated 16 January 2026

Debiased InfoNCE is a modified contrastive loss that corrects inherent negative sampling bias to achieve faithful density-ratio and mutual information estimation.
It employs corrections such as auxiliary anchor classes, false-negative subtraction, and positive-unlabeled mining to ensure unbiased and consistent representation learning.
Empirical results show improved performance and fairness across recommendation systems, graph contrastive learning, and supervised metric tasks.

Debiased InfoNCE refers to a suite of principled modifications to the classic InfoNCE loss, targeting the systematic biases that arise from negative sampling, dataset confounders, or density-ratio indeterminacy in contrastive learning frameworks. These debiasing strategies span mutual information estimation, graph contrastive learning, supervised metric learning, and recommendation systems. While standard InfoNCE excels in learning structured density ratios, it incurs bias due to its inherent loss formulation and sampling procedures. Debiased variants are designed to achieve Fisher-consistent density-ratio estimation, unbiased mutual-information estimation, or robustness to dataset or sampling artifacts, with empirical benefits documented across several modalities.

1. Formal Definition and Bias in InfoNCE

InfoNCE is a contrastive loss originally formulated to upper-bound the mutual information $I(X;Y)$ for random variables $(X,Y)$ via discriminating a single positive sample from $K$ negatives:

$L_{\mathrm{InfoNCE}}(\theta) = -\mathbb{E} \left[ \log \frac{e^{c_\theta(x_1,y)}}{\frac{1}{K}\sum_{j=1}^K e^{c_\theta(x_j, y)}} \right]$

where the critic $c_\theta(x,y)$ scores the compatibility, $(x_1,y)$ denotes a positive pair drawn from $p(x, y)$ , and $(x_2, \ldots, x_K)$ are negatives drawn from $p(x)$ . When $c_\theta(x, y) = \log r_\theta(x, y)$ , one can interpret the negative loss as a form of $K$ -way Jensen-Shannon divergence.

For any finite $K$ , InfoNCE is a lower bound on $I(X;Y)$ , with bias

$\mathrm{Bias}_{\mathrm{InfoNCE}} = I(X;Y) - I_{\mathrm{InfoNCE}} = D(p(x|y) \| p(x)) - D_{K\text{-JS}}(p(x|y), p(x))$

which remains strictly positive for all finite $K$ (Ryu et al., 29 Oct 2025). As such, InfoNCE systematically underestimates mutual information.

2. Debiasing via Auxiliary Classes: The InfoNCE-Anchor Approach

To eliminate the indeterminacy in learned density ratios, InfoNCE-anchor introduces an auxiliary anchor class in the underlying tensorized classification problem. Specifically, for two densities $q_1(x)$ (positive) and $q_0(x)$ (noise), $K+1$ classes are defined:

Class 0 (anchor): $q_0(x_1)\cdots q_0(x_K)$
Class $i \in \{1,\dots,K\}$ : $q_1(x_i)\prod_{j \neq i} q_0(x_j)$

Class priors $p(0) = v/(K+v)$ and $p(i) = 1/(K+v)$ for $i \ge 1$ ( $v > 0$ ) are assigned. The posterior is modeled,

$p(z \mid x_{1:K}) = \begin{cases} \frac{v}{v + \sum_{j=1}^K r^*(x_j)}, & z = 0 \ \frac{r^*(x_z)}{v + \sum_{j=1}^K r^*(x_j)}, & 1 \le z \le K \end{cases}$

where $r^*(x) = \frac{q_1(x)}{q_0(x)}$ . Optimization of the InfoNCE-anchor objective (cross-entropy loss over classes) is Fisher-consistent, yielding $r_{\theta^*}(x) = r^*(x)$ (Theorem 3), removing the indeterminacy and enabling consistent density-ratio estimation (Ryu et al., 29 Oct 2025).

3. Debiased InfoNCE in Recommendation and Pointwise Losses

In recommendation, negative sampling from the marginal $p_u$ often contaminates the denominator with false negatives, especially when positives (items with observed user interactions) are not completely observed. Debiased InfoNCE (Jin et al., 2023, Li et al., 2023) corrects this by analytically subtracting the expected contribution of false negatives. For user $u$ , positive fraction $\tau_u^+$ and negative fraction $\tau_u^-$ , the empirical debiased denominator is

$f_{\mathrm{debias}, u} = \max \left\{ \frac{1}{\tau_u^-} \left( \frac{1}{N} \sum_{n=1}^N e^{\hat{y}_{uj_n} / \tau} - \tau_u^+ \frac{1}{M} \sum_{m=1}^M e^{\hat{y}_{uk_m} / \tau} \right), e^{-1/\tau} \right\}$

Debiased InfoNCE thus becomes

$L_{\mathrm{InfoNCE}}^{\mathrm{debiased}} = -\mathbb{E}_u \mathbb{E}_{i \sim p^+_u} \left[ \log \frac{e^{\hat{y}_{ui}/\tau}}{e^{\hat{y}_{ui}/\tau} + \lambda f_{\mathrm{debias}, u}} \right]$

Unbiasedness is theoretically guaranteed by construction; empirical gains in recommendation (Recall@20, NDCG@20) consistently confirm the advantage of the debiased variant (1.7% improvement over InfoNCE; MINE+ up to 11.5%) (Jin et al., 2023, Li et al., 2023).

4. Positive-Unlabeled Correction in Graph Contrastive Learning

In GCL, InfoNCE suffers from semantic bias when treating all non-augmented pairs as negatives, ignoring that some may be true positives (semantically similar by graph structure or attributes). Wang et al. reinterpret GCL as a Positive-Unlabeled (PU) learning problem and prove that InfoNCE scores $s_\theta(n, n')$ rank pairs by their probability of positivity (“free lunch” theorem). After warm-up, pseudo-positive pairs among unlabeled negatives are extracted via thresholding $s_\theta$ ; the corrected likelihood objective then maximizes the probability of both labeled and mined positives, weighted by confidence $\hat{s}_\theta$ and a factor $\beta < 1$ :

$L_{n,n'}^{\mathrm{corr}} = -\log \left[ P_{n,n'} \prod_{(n, n'') \in D_U^+} (P_{n,n''})^{\beta \hat{s}_\theta(n, n'')} \right]$

Empirical gains in node classification accuracy, especially in out-of-domain scenarios, support the value of semantically guided debiasing (up to +9.05 pp on GOODCBAS). Synergy with LLM-based features further enhances hidden-positive recovery (Wang et al., 7 May 2025).

5. Debiased Losses in Supervised Contrastive Learning

Barbano et al. expose how dataset bias (e.g., spurious correlations) can undermine InfoNCE and SupCon, with positive samples grouped by bias rather than true class. They frame debiased contrastive learning as enforcing an $\varepsilon$ -margin between positives and negatives,

$\varepsilon\text{-SupInfoNCE} = -\sum_i \log \frac{ \exp(s^+_i) }{ \exp(s^+_i - \varepsilon) + \sum_j \exp(s^-_j) }$

with $\varepsilon > 0$ enforcing a minimal gap. The FairKL regularizer matches anchor-to-positive and anchor-to-negative distance distributions across bias-aligned and bias-conflicting sets, ensuring that learned representations are robust and minimize bias. Combined, $\varepsilon$ -SupInfoNCE and FairKL achieve state-of-the-art debiasing on synthetic and realistic benchmarks (Biased-MNIST, Corrupted-CIFAR10, bFFHQ), with unbiased test accuracy up to $\sim90.5\%$ (Barbano et al., 2022).

6. Unified Decision-Theoretic Framework and Implications

The consistent pattern across domains is that debiased InfoNCE is enabled by explicit correction mechanisms—anchor classes, analytical expectation subtraction, positive mining, or margin regularization. Theoretical properties center on Fisher consistency, unbiased mutual information estimation, and robust density-ratio recovery. Under a decision-theoretic framework, these corrections generalize beyond InfoNCE to chi-squared plugin estimators ( $X^2$ ), $f$ -divergence estimators, and more, via selection of proper scoring rules (strictly convex generating functions). InfoNCE-anchor, for example, is a cross-entropy (log-score) proper scoring rule, while other losses can be derived using alternative scoring functions (Ryu et al., 29 Oct 2025).

A plausible implication is that accurate MI estimation is neither necessary nor sufficient for superior representation learning performance; contrastive methods benefit predominantly from learning structured density ratios, not the exact $I(X;Y)$ . Debiased InfoNCE is thus most crucial for tasks requiring valid mutual information measurement, statistical decision theory consistency, or fairness/robustness guarantees rather than representation utility per se.

7. Summary Table: Debiased InfoNCE Variants Across Modalities

Modality	Debiasing Mechanism	Key Theoretical Property
Mutual Information Est.	Anchor class (InfoNCE-anchor)	Fisher-consistent, unbiased MI estimate (Ryu et al., 29 Oct 2025)
Recommender Systems	Analytic false-neg subtraction	Unbiased empirical loss for positives/negatives (Jin et al., 2023, Li et al., 2023)
Graph Contrastive	PU mining, score thresholding	Density-ratio recovery; semantic pair correction (Wang et al., 7 May 2025)
Metric/Supervised Vision	$\varepsilon$ -margin, FairKL	Robustness to dataset confounders (Barbano et al., 2022)

Taken together, debiased InfoNCE unifies contrastive and classification-based objectives under a principled framework, substantiating empirical and theoretical advances across information-theoretic, graph, supervised, and recommendation contexts.