Papers
Topics
Authors
Recent
Search
2000 character limit reached

CL3AN-GNN: Curriculum-Guided Three-Stage Attention Network

Updated 10 February 2026
  • The paper introduces a novel graph neural network that integrates curriculum-guided feature learning with a three-stage attention mechanism for imbalanced node classification.
  • It employs a modular architecture with dual GCN/GAT backbones and curriculum-controlled subgraph selection to progressively manage feature complexity.
  • Empirical evaluations show significant gains (+3–5% improvement) in accuracy, macro-F1, and AUC, reducing minority misclassification from over 30% to under 10%.

Curriculum-Guided Feature Learning and Three-Stage Attention Network (CL3AN-GNN) is a graph neural network (GNN) framework designed to address imbalanced node classification. It integrates curriculum learning with a three-stage attention mechanism (Engage, Enact, Embed) that sequentially organizes feature extraction and aggregation. The model systematically progresses from structurally simple to complex graph patterns, with an adaptive curriculum-aligned loss that promotes stable gradient flow and robust representation of minority class nodes. Empirical evaluations across standard benchmarks show significant gains in accuracy, macro-F1, and AUC compared to state-of-the-art baselines (Fofanah et al., 3 Feb 2026).

1. Architectural Overview

CL3AN-GNN adopts a modular architecture composed of four sequential modules: feature extraction (dual GCN/GAT backbone), embedding layer (linear projection and multi-head refinement), curriculum-guided three-stage attention (Engage, Enact, Embed), and a curriculum-weighted classifier. Input to the model is a graph G=(V,E)G=(V,E) with node features XX. Node and edge embeddings are formed using GCN and GAT, projected to lower-dimensional representations zv(0)\mathbf{z}_v^{(0)}, zvu(0)\mathbf{z}_{vu}^{(0)}. Through TT training epochs, CL3AN-GNN dynamically constructs a subgraph GtG_t at each epoch tt based on a monotonically increasing difficulty threshold θt\theta_t, then applies the three-stage attention pipeline and backpropagates a curriculum-aligned total loss. The final node classification is produced by a softmax function over the last latent node embedding.

The operational workflow is encapsulated in the following high-level loop:

zvu(0)\mathbf{z}_{vu}^{(0)}3

2. Curriculum-Guided Feature Learning

The curriculum is defined by per-node (DvD_v) and per-edge (Du,vD_{u,v}) difficulty functions, producing subgraphs XX0 of ascending complexity over training. Simpler features are prioritized—1-hop neighborhood patterns (XX1), low-degree node attributes (local feature homogeneity), and class-separable node pairs (separated in initial GCN/GAT space). The curriculum threshold evolves as XX2, introducing harder structures in later epochs.

The aggregated node difficulty metric is

XX3

where XX4 quantifies embedding-based separability.

During training, each epoch samples a curriculum subgraph XX5 with XX6 and XX7. This sequential exposure is foundational for stable learning under class imbalance, preventing early overfitting to majority classes.

3. Three-Stage Attention Mechanism

CL3AN-GNN's core novelty is the staged attention mechanism:

  1. Engage Stage: Focuses attention on reliably discriminative, simple features—low-degree, class-homogeneous, and well-separated node pairs—via self-attention with scaled dot products. This step anchors the representation using broad majority signals and clean examples from minority classes.
  2. Enact Stage: Refines attention to encompass medium-difficulty graph patterns such as multi-hop paths and heterophilic connections. Queries are formed from concatenated representations of the node and its first-stage context aggregate. Adjustable attention weights detect cross-type and boundary nodes.
  3. Embed Stage: Consolidates all accumulated signals into a final discriminative embedding yielding robust minority class delineation. This final pass integrates increasingly complex structures, yielding enhanced representation for imbalanced settings.

Mathematically, each stage implements self-attention updates via

XX8

Stage-wise ablation demonstrates additive performance gains, with Engage alone offering substantial improvements, further increased by progressive inclusion of Enact and Embed stages.

4. Loss Function and Optimization

The curriculum-aligned loss is epoch- and class-sensitive, dynamically reweighting classes based on running accuracy:

XX9

where zv(0)\mathbf{z}_v^{(0)}0 decreases from zv(0)\mathbf{z}_v^{(0)}1 to zv(0)\mathbf{z}_v^{(0)}2 over zv(0)\mathbf{z}_v^{(0)}3, modulating the influence of class-specific error. The full loss consists of curriculum-scheduled cross-entropy and regularizers:

zv(0)\mathbf{z}_v^{(0)}4

with zv(0)\mathbf{z}_v^{(0)}5 scheduled to emphasize different loss components across the Engage, Enact, and Embed phases. Regularization includes entropy and diversity losses, and optionally a gradient stability penalty zv(0)\mathbf{z}_v^{(0)}6 to mitigate gradient explosion or vanishing.

This loss design explicitly adapts to per-epoch class learning progress, counteracting majority class bias as training progresses.

5. Benchmark Datasets and Evaluation Protocol

CL3AN-GNN is evaluated on eight benchmarks encompassing social, citation, and biological graphs:

Dataset Nodes Edges Classes Imbalance zv(0)\mathbf{z}_v^{(0)}7 Feature Dim Heterophily zv(0)\mathbf{z}_v^{(0)}8
Cora 2,708 5,278 7 0.1–0.8 1,433 0.18
Citeseer 3,327 4,552 6 0.1–0.8 3,703 0.23
PubMed 19,717 88,648 3 0.1–0.8 5,414 0.21
Amazon Photo 7,650 119,081 8 0.1–0.8 745 0.42
Amazon Comp. 13,752 491,722 10 0.1–0.8 767 0.38
Coauthor CS 18,333 163,788 15 0.1–0.8 6,805 0.35
Chameleon 2,277 36,101 5 0.1–0.8 2,325 0.83
OGBN-Arxiv 169,343 1,166,243 40 0.1–0.8 128 0.29

Imbalance is controlled by artificially downsampling minority classes to achieve specific zv(0)\mathbf{z}_v^{(0)}9. Experiments use a 70/10/20 split over 10 runs with fixed seeds; metrics reported are accuracy, macro-F1, and AUC-ROC. Hyperparameters—hidden dimensions, learning rate, dropout, weight decay, attention heads, training epochs—are validated per-benchmark.

6. Experimental Results and Analysis

6.1 Quantitative Performance

CL3AN-GNN demonstrates consistent, substantial performance increases over the strongest baselines across all datasets. In representative results:

Method Cora (ACC/AUC/F1) Citeseer PubMed Amazon Photo Coauthor CS Chameleon OGBN-Arxiv Amazon Comp.
Best Baseline 0.835/0.973/0.829 0.778/0.928/0.745 0.823/0.963/0.815 0.918/0.939/0.905 0.889/0.986/0.882 0.818/0.923/0.807 0.579/0.907/0.561 0.889/0.988/0.826
CL3AN-GNN 0.950/0.998/0.949 0.907/0.995/0.892 0.877/0.961/0.872 0.929/0.989/0.922 0.953/0.998/0.923 0.855/0.963/0.829 0.598/0.945/0.585 0.916/0.996/0.891

Performance improvements are often in the range of +3–5% in primary metrics.

6.2 Ablation Studies

Stage-wise ablation on Cora, Citeseer, and Amazon Photo datasets demonstrates the incremental impact of each pipeline component. The Engage stage notably improves accuracy, while the progressive addition of Enact and Embed further increase final scores by 3–6% per stage. The full curriculum-guided version provides a final 1–2% improvement.

6.3 Convergence and Stability

Curriculum-guided staged attention leads to more stable training dynamics:

  • Attention–Gradient correlation increases with each stage (zvu(0)\mathbf{z}_{vu}^{(0)}0 in Stage 1, zvu(0)\mathbf{z}_{vu}^{(0)}1 Stage 2, zvu(0)\mathbf{z}_{vu}^{(0)}2 Stage 3).
  • Gradient variance slope decreases from ≈1.23 in Stage 1 (high variance) to ≈0.10 in Stage 3 (very stable).
  • The staged curriculum approach accelerates convergence and mitigates instability typical in imbalanced GNN training.

6.4 Representation Visualization

Qualitative analyses using t-SNE show minority-class cluster separation becomes more pronounced after the curriculum. Confusion matrices indicate a significant reduction in minority-class misclassification rates, from over 30% to under 10%.

7. Implications and Context

CL3AN-GNN introduces a theoretically grounded methodology for curriculum learning in graph neural architectures by aligning feature exposure and aggregation complexity with model maturity. The sequential, staged attention mechanism is explicitly adapted to the challenges of imbalanced node classification, addressing class skew, training instability, and minority node misclassification.

A plausible implication is that staged curriculum attention, as implemented here, offers a template for future architectures requiring progressive induction of structural complexity, both in GNN and non-graph inductive paradigms. Its empirical gains in generalization, convergence rate, and stability suggest robustness to real-world label imbalance scenarios. Further exploration may establish extensions to heterogeneous, dynamic, or evolving graphs (Fofanah et al., 3 Feb 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Curriculum-Guided Feature Learning and Three-Stage Attention Network (CL3AN-GNN).