Papers
Topics
Authors
Recent
Search
2000 character limit reached

Pseudo-Labeled Graph Condensation

Updated 22 January 2026
  • Pseudo-Labeled Graph Condensation is a technique that synthesizes compact graphs using latent pseudo-labels to preserve essential node embeddings for efficient GNN training.
  • It employs self-supervised learning and pseudo-label guided replay to optimize representation matching even in noisy or label-scarce environments.
  • Empirical evaluations show that PLGC achieves near-supervised performance on tasks like node classification and link prediction while significantly reducing the training graph size.

Pseudo-Labeled Graph Condensation (PLGC) is a graph dataset reduction paradigm that generates small, information-preserving synthetic graphs by leveraging pseudo-labels, enabling efficient graph neural network (GNN) training in both supervised and label-free, noisy, or weakly-labeled settings. PLGC encompasses self-supervised methods (as in "PLGC: Pseudo-Labeled Graph Condensation" (Nandy et al., 15 Jan 2026)) and pseudo-label-guided replay condensation for continual learning (as deployed in PUMA (Liu et al., 2023)). This entry outlines the principal methodologies and theoretical foundations of PLGC, presents detailed algorithmic procedures, analyzes empirical outcomes, and discusses implementation practices and limitations.

1. Motivation and Problem Setting

Graph condensation targets the replacement of a massive, costly-to-train graph T=(X,A,Y)\mathcal{T} = (X, A, Y) of NN nodes with a compressed synthetic graph S=(X,A,Y)\mathcal{S} = (X', A', Y') of NNN' \ll N nodes such that a GNN trained on S\mathcal{S} preserves the predictive and representational statistics of T\mathcal{T}. Classical supervised condensation requires dense, reliable ground-truth labels, optimizing:

minA,X,YLnode(GNNθ(A,X),Y)\min_{A', X', Y'} \mathcal{L}_{\text{node}}(\text{GNN}_\theta(A', X'), Y')

However, real-world graphs frequently exhibit label scarcity, inconsistency, and noise. Under such conditions, supervised condensation misaligns class-conditional statistics, causing overfitting and poor generalization. PLGC reorients the condensation paradigm:

  • In the self-supervised variant (Nandy et al., 15 Jan 2026), condensation proceeds without YY, constructing latent pseudo-labels (Y~\tilde Y; prototypes of node embeddings) and node-to-prototype assignments (QQ), matched by representation statistics.
  • In continual learning (Liu et al., 2023), pseudo-labels with high confidence are dynamically generated for unlabeled nodes, expanding the set of condensation targets and improving distributional matching.

This strategy enables PLGC to remain robust and informative when ground-truth annotations are unreliable or absent.

2. Methodological Foundations

PLGC consists of two primary algorithmic phases: latent pseudo-label construction and condensed-graph optimization (Nandy et al., 15 Jan 2026).

A. Pseudo-Label Construction

Pseudo-labels Y~RK×d\tilde Y \in \mathbb{R}^{K \times d} represent KK prototype centroids in embedding space, each assigned to graph nodes through a balanced assignment matrix QT{0,1}N×KQ_{\mathcal{T}} \in \{0,1\}^{N\times K}. The assignment ensures equitable representation: QT1N=(1/K)1K  and  QT1K=(1/N)1NQ_{\mathcal{T}}^\top \mathbf{1}_N = (1/K)\mathbf{1}_K \;\text{and}\; Q_{\mathcal{T}}\mathbf{1}_K = (1/N)\mathbf{1}_N Under random augmentations Ti(A,X)T_i(A, X), node embeddings (ZiZ_i) are computed, then soft assignments (QiQ_i) are produced by solving a balanced entropy-regularized linear program (Sinkhorn-Knopp scaling). For each batch node, a swapped assignment view prediction loss swap\ell_{\text{swap}} aligns embeddings across views. The joint objective Lpseudo\mathcal{L}_{\text{pseudo}} is backpropagated to update both the encoder and prototypes.

B. Condensed Graph Optimization

After prototype convergence, synthetic features XX' are optimized to ensure each condensed node's embedding zSkz_{\mathcal{S}|k} approximates its prototype: Lrep(X)=k=1Ky~kzSk22\mathcal{L}_{\text{rep}}(X') = \sum_{k=1}^K \| \tilde y_k - z_{\mathcal{S}|k} \|_2^2 In practice, adjacency AA' is often fixed or omitted (empty/identity), focusing optimization on XX'.

C. Pseudo-Label Guided Condensation in Continual Learning

Within PUMA (Liu et al., 2023), PLGC operates over iterative tasks kk:

  1. Initial condensation uses available true labels to generate a preliminary condensed graph G^k\hat G_k via class-conditional maximum-mean-discrepancy (MMD) loss between propagated real and synthetic features.
  2. Pseudo-label expansion trains a classifier on the replay buffer Mk1G^kM_{k-1} \cup \hat G_k to infer pseudo-labels for unlabeled nodes, selecting those with softmax confidence above a threshold τ\tau.
  3. Refined condensation incorporates the expanded label set into a repeated condensation loop, producing a replay buffer MkM_k that facilitates efficient, edge-free continual training.

3. Formal Optimization Objectives and Algorithms

Self-supervised PLGC (Nandy et al., 15 Jan 2026)—for a given unlabeled graph T=(X,A)\mathcal{T} = (X, A) and compression ratio rr:

  1. Set K=N=rNK = N' = \lceil r N \rceil.
  2. Alternate:
    • Pseudo-label learning: for each batch, sample augmentations, compute embeddings, solve Sinkhorn assignments, apply loss, update encoder/prototypes.
    • Condensation: holding encoder and prototypes fixed, optimize XX' via the representation-matching objective.

PUMA's PLGC module (Liu et al., 2023)—for labeled data:

  1. One-time feature propagation: Fk=LkXkF_k = L_k X_k where Lk=D½AkD½L_k = D^{-½}A_k D^{-½}.
  2. Wide MLP embeddings: Ek=fθ(Fk)E_k = f_\theta(F_k), E~k=fθ(X~k)Ẽ_k = f_\theta(\tilde X_k).
  3. Task-wise condensation loss:

MMD(G^k;Θ)=cCkrc,kmean(Ec,k)mean(E~c,k)22\ell_{\text{MMD}}(\hat G_k; \Theta) = \sum_{c \in C_k} r_{c,k} \| \text{mean}(E_{c, k}) - \text{mean}(\tilde E_{c, k}) \|_2^2

Optimized over X~k,Y~kX̃_k, Ỹ_k, constrained by class ratios and memory budget.

Both frameworks employ edge-free condensed graphs to accelerate memory replay, training downstream models via MLPs.

4. Theoretical Foundations

PLGC incorporates rigorous guarantees for prototype concentration and assignment fidelity (Nandy et al., 15 Jan 2026). Under sub-Gaussian latent structure and cluster separability:

  • Prototype concentration: Each learned prototype y~k\tilde y_k converges near its true cluster mean μk\mu_k at rate ϵk=4σ(d+log(2K/δ))/sk\epsilon_k = 4\sigma \sqrt{(d + \log(2K/\delta))/s_k}.
  • Interior-point recovery: All nodes sufficiently close to μk\mu_k remain correctly assigned.
  • Separation: For large enough cluster size sks_k, prototypes remain well-separated.

A plausible implication is that, even in the complete absence of ground-truth labels, the synthetic condensed graph will preserve the latent geometry and feature/structural statistics critical for downstream tasks.

In pseudo-label guided approaches (Liu et al., 2023), including high-confidence inferred labels expands the class coverage for matching, further improving alignment between synthetic and real distributions.

5. Empirical Performance and Practical Implementation

PLGC is evaluated on node classification and link prediction across both transductive and inductive graphs (Cora, Citeseer, Ogbn-Arxiv, Flickr, Reddit) (Nandy et al., 15 Jan 2026). Key results:

  • On clean-label datasets, PLGC is within 1% of best supervised methods and exceeds all self-supervised baselines by up to 10 points.
  • Under label noise (noise>0.7\text{noise} > 0.7), supervised methods degrade by up to 30pp, PLGC degrades by <5<5pp, outperforming baselines by 15–25 points.
  • For multi-source graphs, supervised baselines collapse, while PLGC maintains performance within 5 points of clean conditions.
  • Link prediction AUROC is similarly robust to noise and source heterogeneity.

PUMA's continual-learning PLGC circa replay buffer achieves state-of-the-art accuracy and backward transfer on class-incremental task protocols (Liu et al., 2023), substantially outpacing regularization, sampling-based replay, and previous condensation frameworks. Condensation and retraining times are recorded in minutes for large graphs (e.g., \sim3 min for 170K nodes), compared to naive replay alternatives.

Hyper-parameter settings and ablations in both references indicate stability for budget ratios (0.5%1%0.5\%-1\%), number of prototypes, augmentation strengths, Sinkhorn temperature, and learning rates.

Dataset PLGC Accuracy (clean) PLGC Degradation (noise=0.7) Best Supervised Baseline
Cora 81.6% −4.5pp GEOM (83.6%)
Reddit 88.3% −4.2pp GCond (86.4%)
Products 74% −3.7pp CaT (71%) (Liu et al., 2023)

6. Advantages, Limitations, and Implementation Recommendations

Advantages:

  • Label-free condensation for completely unlabeled graphs.
  • Noise robustness: avoids overfitting to spurious annotation errors in both supervised and self-supervised scenarios.
  • Label efficiency: minimal annotations suffice for downstream fine-tuning.
  • Multi-source extensibility: naturally condenses and integrates heterogeneous subgraphs.
  • Task transferability: condensed graphs function across node classification, link prediction, and graph-level tasks with minimal modification.

Limitations:

  • Adjacency structure is not learned explicitly; if downstream tasks are sensitive to edge topology, explicit adjacency matching may be required.
  • Alternating self-supervised training entails nontrivial computational overhead; however, condensation remains orders of magnitude faster than full-graph retraining at every hyperparameter point.

Implementation tips:

  • Use PyTorch Geometric or DGL for encoder sharing and algorithmic flexibility.
  • Match number of prototypes KK to desired compression (N=rNN' = rN), ensuring skds_k \gg d for statistical concentration.
  • Apply standard graph augmentation (0.1–0.2 edge drop, 10% feature masking) for invariant prototype learning.
  • Employ Sinkhorn temperature $0.05$–$0.2$ and batch sizes $256$–$1024$ per assignment round.
  • Optimize pseudo-labels for \sim200 epochs and condensed features for \sim100 steps.

In summary, Pseudo-Labeled Graph Condensation provides a unified paradigm for efficient, robust, and minimally supervised graph reduction methods, yielding synthetic datasets that preserve latent geometric and predictive information under adverse labeling conditions and facilitating scalable GNN training for both static and continual learning scenarios (Nandy et al., 15 Jan 2026, Liu et al., 2023).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Pseudo-Labeled Graph Condensation (PLGC).