Papers
Topics
Authors
Recent
Search
2000 character limit reached

Graph Contrastive PU Learning

Updated 18 January 2026
  • The paper introduces a framework that integrates Positive-Unlabeled (PU) learning with graph contrastive pre-training to correct sampling bias.
  • It leverages the InfoNCE loss as an estimator of semantic similarity, using confidence-based weighting to mine true positives.
  • Empirical results demonstrate improved IID and OOD representation quality, validating the method’s effectiveness on multiple benchmarks.

Graph Contrastive PU Learning (GCL-PU) is a principled framework that integrates Positive-Unlabeled (PU) learning into graph contrastive pre-training. Traditional graph contrastive learning (GCL) relies on augmentation-induced positive pairs and treats all remaining pairs as negatives, which introduces substantial sampling bias by mislabeling many semantically similar pairs as negatives. By treating GCL as a PU learning task and leveraging the InfoNCE loss as a means for estimating the posterior probability of semantic similarity, GCL-PU sidesteps this bias and enables semantically guided self-supervision that has been empirically validated to improve both in-distribution (IID) and out-of-distribution (OOD) representation quality (Wang et al., 7 May 2025).

1. Problem Formulation: PU Learning in Graph Contrastive Pre-training

Let x=(n,n)x=(n,n') denote a pair of nodes from a graph, forming a contrastive sample. Two binary labels are defined: a semantic label y{+1,1}y\in\{+1,-1\}, where y(x)=+1y(x)=+1 if and only if xx is truly semantically similar (i.e., “positive”), and an observed label o{+1,1}o\in\{+1,-1\}, where o(x)=+1o(x)=+1 if and only if xx is labeled as positive by augmentation.

In traditional GCL:

  • Daug+={xy=+1,o=+1}D^{\text{aug+}} = \{x\,|\,y=+1,o=+1\} (labeled positives: augmented pairs)
  • Daug-={xo=1}D^{\text{aug-}} = \{x\,|\,o=-1\} (treated as negatives, with unknown semantics)

Viewing this as a PU problem:

  • DL+=Daug+D_L^+ = D^{\text{aug+}} (labeled positives)
  • DU=Daug-D_U = D^{\text{aug-}} (unlabeled, containing both true positives DU+D_U^+ and true negatives DUD_U^-)

A critical issue is that semantically similar, non-augmented pairs (DU+D_U^+) are forced as negatives, thus driving apart representations of genuinely similar nodes. This suggests a fundamental mismatch between the observable and semantic structures in GCL when not accounting for the positive-unlabeled nature of the data (Wang et al., 7 May 2025).

2. InfoNCE as a Positive-Unlabeled Estimator

InfoNCE, the prevailing contrastive loss, computes similarity between encoded node pairs:

sθ(ui,vj)=exp(cos(zui,zvj)τ)s_\theta(u_i, v_j) = \exp\left(\frac{\cos(z_{u_i}, z_{v_j})}{\tau}\right)

The pairwise probability is then normalized:

Pui,vj=sθ(ui,vj)k=1N[sθ(ui,uk)+sθ(ui,vk)]P_{u_i, v_j} = \frac{s_\theta(u_i, v_j)}{\sum_{k=1}^N \left[s_\theta(u_i, u_k) + s_\theta(u_i, v_k)\right]}

Originally, InfoNCE is justified as modeling a density ratio

f(xc)p(xc)p(x),f(x|c) \propto \frac{p(x|c)}{p(x)},

and, specifically for GCL, sθ(n,n)p(xy=+1,o=+1)/p(x)s_\theta(n, n') \propto p(x|y=+1, o=+1)/p(x). The Invariance-of-Order (IOD) assumption asserts that this value preserves the ordering of p(y=+1x)p(y=+1|x). Therefore, Pn,np(y=+1x=(n,n))P_{n,n'} \approx p(y=+1|x=(n,n')), meaning that normalized InfoNCE similarity is proportional (in order) to the posterior probability of true semantic similarity (Wang et al., 7 May 2025).

3. Semantically Guided InfoNCE (IFL-GCL) Loss

The classic InfoNCE loss averages negative log-likelihood over labeled positives:

L=E(n,n)DL+[logPn,n]L = - \mathbb{E}_{(n,n') \in D_L^+} [\log P_{n,n'}]

After mining semantically similar pairs from the unlabeled pool (DU+={xDUsθ(x)>ts}D_U^+ = \{x \in D_U \mid s_\theta(x) > t_s\}, for threshold tst_s), the positive set is expanded to DL+DU+D_L^+ \cup D_U^+. For each labeled positive, mined positives sharing the same anchor node nn are introduced with confidence weights:

s^(n,n)=sθ(n,n)mini,jsθ(ni,nj)maxi,jsθ(ni,nj)mini,jsθ(ni,nj)\hat{s}(n,n'') = \frac{s_\theta(n,n'') - \min_{i,j} s_\theta(n_i, n_j)}{\max_{i,j} s_\theta(n_i, n_j) - \min_{i,j} s_\theta(n_i, n_j)}

The corrected local loss combines all positives for each anchor, weighted by a global factor β\beta and the normalized confidence:

Ln,ncorr=log[Pn,n(n,n)DU+Pn,nβs^(n,n)]L_{n,n'}^{\text{corr}} = - \log \left[ P_{n,n'} \cdot \prod_{(n, n'') \in D_U^+} P_{n,n''}^{\beta \cdot \hat{s}(n, n'')} \right]

The full semantically guided (IFL-GCL) loss:

Lcorr=E(n,n)DL+log(Pn,n(n,n)DU+Pn,nβs^(n,n))L^{\text{corr}} = - \mathbb{E}_{(n, n') \in D_L^+} \log \left(P_{n, n'} \cdot \prod_{(n, n'') \in D_U^+} P_{n, n''}^{\beta \cdot \hat{s}(n, n'')}\right)

Where:

  • Pn,nP_{n, n'}: likelihood of true augment positives
  • DU+D_U^+: mined unlabeled positives
  • β\beta: global weighting factor
  • s^(n,n)\hat{s}(n, n''): normalized similarity as confidence weight (Wang et al., 7 May 2025)

4. Learning Algorithm: IFL-GCL

The IFL-GCL framework is instantiated as follows:

  1. Generate two augmented graph views.
  2. Initialize model parameters and split labeled/unlabeled pairs.
  3. Warm-up using standard InfoNCE.
  4. Repeatedly:
    • Compute similarity for all unlabeled pairs, mining positives above a threshold.
    • Resample positives and negatives from updated sets.
    • Build the corrected loss from both true and mined positives, applying their confidence-based weights.
    • Update model parameters over several optimization steps.
  5. Return the trained encoder parameters.

This routine integrates continual mining and re-weighting of unlabeled samples, aligning training dynamics with the underlying semantic structure (Wang et al., 7 May 2025).

5. Empirical Results and Practical Implications

The benchmark evaluation includes node classification tasks (via linear probe and fine-tuning) on commonly used datasets for IID and OOD generalization: Cora, PubMed, CiteSeer, WikiCS, Computers, Photo, and the GOOD series (Twitch, Cora, CBAS). Baseline methods include DGI, COSTA, BGRL, MVGRL, GBT, GRACE, and GCA. IFL-GCL is applied atop GRACE (IFL-GR) and GCA (IFL-GC).

  • IFL-GR and IFL-GC achieve best or second-best accuracy on all nine benchmarks.
  • Relative to their base methods, GRACE and GCA, average IID gains are 0.5–1.5%, with OOD gains up to 9.05%.
  • Supplementing features with LLM-derived representations (Llama3.2, Qwen2.5) yields further improvements of 0.3–1.3%, confirming that more informative initial semantics facilitate superior positive mining.
  • These findings validate the systemic gains from semantically guided positive reweighting and indicate the broader impact of PU learning for graph self-supervision (Wang et al., 7 May 2025).

6. Significance and Outlook

By recasting GCL as a PU learning problem and exploiting InfoNCE as a proxy estimator for semantic similarity, the IFL-GCL method systematically corrects augmentation-induced sampling bias, enabling robust mining of semantically rich positives. The approach is empirically established to provide consistent improvements across diverse graph representation benchmarks and demonstrates compatibility with recent advances in LLM-based semantic enhancement. A plausible implication is that further research may extend such PU-driven correction strategies to other self-supervised paradigms where observed and semantic label spaces are systematically misaligned (Wang et al., 7 May 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Graph Contrastive PU Learning.