CL3AN-GNN: Curriculum-Guided Three-Stage Attention Network
- The paper introduces a novel graph neural network that integrates curriculum-guided feature learning with a three-stage attention mechanism for imbalanced node classification.
- It employs a modular architecture with dual GCN/GAT backbones and curriculum-controlled subgraph selection to progressively manage feature complexity.
- Empirical evaluations show significant gains (+3–5% improvement) in accuracy, macro-F1, and AUC, reducing minority misclassification from over 30% to under 10%.
Curriculum-Guided Feature Learning and Three-Stage Attention Network (CL3AN-GNN) is a graph neural network (GNN) framework designed to address imbalanced node classification. It integrates curriculum learning with a three-stage attention mechanism (Engage, Enact, Embed) that sequentially organizes feature extraction and aggregation. The model systematically progresses from structurally simple to complex graph patterns, with an adaptive curriculum-aligned loss that promotes stable gradient flow and robust representation of minority class nodes. Empirical evaluations across standard benchmarks show significant gains in accuracy, macro-F1, and AUC compared to state-of-the-art baselines (Fofanah et al., 3 Feb 2026).
1. Architectural Overview
CL3AN-GNN adopts a modular architecture composed of four sequential modules: feature extraction (dual GCN/GAT backbone), embedding layer (linear projection and multi-head refinement), curriculum-guided three-stage attention (Engage, Enact, Embed), and a curriculum-weighted classifier. Input to the model is a graph with node features . Node and edge embeddings are formed using GCN and GAT, projected to lower-dimensional representations , . Through training epochs, CL3AN-GNN dynamically constructs a subgraph at each epoch based on a monotonically increasing difficulty threshold , then applies the three-stage attention pipeline and backpropagates a curriculum-aligned total loss. The final node classification is produced by a softmax function over the last latent node embedding.
The operational workflow is encapsulated in the following high-level loop:
3
2. Curriculum-Guided Feature Learning
The curriculum is defined by per-node () and per-edge () difficulty functions, producing subgraphs 0 of ascending complexity over training. Simpler features are prioritized—1-hop neighborhood patterns (1), low-degree node attributes (local feature homogeneity), and class-separable node pairs (separated in initial GCN/GAT space). The curriculum threshold evolves as 2, introducing harder structures in later epochs.
The aggregated node difficulty metric is
3
where 4 quantifies embedding-based separability.
During training, each epoch samples a curriculum subgraph 5 with 6 and 7. This sequential exposure is foundational for stable learning under class imbalance, preventing early overfitting to majority classes.
3. Three-Stage Attention Mechanism
CL3AN-GNN's core novelty is the staged attention mechanism:
- Engage Stage: Focuses attention on reliably discriminative, simple features—low-degree, class-homogeneous, and well-separated node pairs—via self-attention with scaled dot products. This step anchors the representation using broad majority signals and clean examples from minority classes.
- Enact Stage: Refines attention to encompass medium-difficulty graph patterns such as multi-hop paths and heterophilic connections. Queries are formed from concatenated representations of the node and its first-stage context aggregate. Adjustable attention weights detect cross-type and boundary nodes.
- Embed Stage: Consolidates all accumulated signals into a final discriminative embedding yielding robust minority class delineation. This final pass integrates increasingly complex structures, yielding enhanced representation for imbalanced settings.
Mathematically, each stage implements self-attention updates via
8
Stage-wise ablation demonstrates additive performance gains, with Engage alone offering substantial improvements, further increased by progressive inclusion of Enact and Embed stages.
4. Loss Function and Optimization
The curriculum-aligned loss is epoch- and class-sensitive, dynamically reweighting classes based on running accuracy:
9
where 0 decreases from 1 to 2 over 3, modulating the influence of class-specific error. The full loss consists of curriculum-scheduled cross-entropy and regularizers:
4
with 5 scheduled to emphasize different loss components across the Engage, Enact, and Embed phases. Regularization includes entropy and diversity losses, and optionally a gradient stability penalty 6 to mitigate gradient explosion or vanishing.
This loss design explicitly adapts to per-epoch class learning progress, counteracting majority class bias as training progresses.
5. Benchmark Datasets and Evaluation Protocol
CL3AN-GNN is evaluated on eight benchmarks encompassing social, citation, and biological graphs:
| Dataset | Nodes | Edges | Classes | Imbalance 7 | Feature Dim | Heterophily 8 |
|---|---|---|---|---|---|---|
| Cora | 2,708 | 5,278 | 7 | 0.1–0.8 | 1,433 | 0.18 |
| Citeseer | 3,327 | 4,552 | 6 | 0.1–0.8 | 3,703 | 0.23 |
| PubMed | 19,717 | 88,648 | 3 | 0.1–0.8 | 5,414 | 0.21 |
| Amazon Photo | 7,650 | 119,081 | 8 | 0.1–0.8 | 745 | 0.42 |
| Amazon Comp. | 13,752 | 491,722 | 10 | 0.1–0.8 | 767 | 0.38 |
| Coauthor CS | 18,333 | 163,788 | 15 | 0.1–0.8 | 6,805 | 0.35 |
| Chameleon | 2,277 | 36,101 | 5 | 0.1–0.8 | 2,325 | 0.83 |
| OGBN-Arxiv | 169,343 | 1,166,243 | 40 | 0.1–0.8 | 128 | 0.29 |
Imbalance is controlled by artificially downsampling minority classes to achieve specific 9. Experiments use a 70/10/20 split over 10 runs with fixed seeds; metrics reported are accuracy, macro-F1, and AUC-ROC. Hyperparameters—hidden dimensions, learning rate, dropout, weight decay, attention heads, training epochs—are validated per-benchmark.
6. Experimental Results and Analysis
6.1 Quantitative Performance
CL3AN-GNN demonstrates consistent, substantial performance increases over the strongest baselines across all datasets. In representative results:
| Method | Cora (ACC/AUC/F1) | Citeseer | PubMed | Amazon Photo | Coauthor CS | Chameleon | OGBN-Arxiv | Amazon Comp. |
|---|---|---|---|---|---|---|---|---|
| Best Baseline | 0.835/0.973/0.829 | 0.778/0.928/0.745 | 0.823/0.963/0.815 | 0.918/0.939/0.905 | 0.889/0.986/0.882 | 0.818/0.923/0.807 | 0.579/0.907/0.561 | 0.889/0.988/0.826 |
| CL3AN-GNN | 0.950/0.998/0.949 | 0.907/0.995/0.892 | 0.877/0.961/0.872 | 0.929/0.989/0.922 | 0.953/0.998/0.923 | 0.855/0.963/0.829 | 0.598/0.945/0.585 | 0.916/0.996/0.891 |
Performance improvements are often in the range of +3–5% in primary metrics.
6.2 Ablation Studies
Stage-wise ablation on Cora, Citeseer, and Amazon Photo datasets demonstrates the incremental impact of each pipeline component. The Engage stage notably improves accuracy, while the progressive addition of Enact and Embed further increase final scores by 3–6% per stage. The full curriculum-guided version provides a final 1–2% improvement.
6.3 Convergence and Stability
Curriculum-guided staged attention leads to more stable training dynamics:
- Attention–Gradient correlation increases with each stage (0 in Stage 1, 1 Stage 2, 2 Stage 3).
- Gradient variance slope decreases from ≈1.23 in Stage 1 (high variance) to ≈0.10 in Stage 3 (very stable).
- The staged curriculum approach accelerates convergence and mitigates instability typical in imbalanced GNN training.
6.4 Representation Visualization
Qualitative analyses using t-SNE show minority-class cluster separation becomes more pronounced after the curriculum. Confusion matrices indicate a significant reduction in minority-class misclassification rates, from over 30% to under 10%.
7. Implications and Context
CL3AN-GNN introduces a theoretically grounded methodology for curriculum learning in graph neural architectures by aligning feature exposure and aggregation complexity with model maturity. The sequential, staged attention mechanism is explicitly adapted to the challenges of imbalanced node classification, addressing class skew, training instability, and minority node misclassification.
A plausible implication is that staged curriculum attention, as implemented here, offers a template for future architectures requiring progressive induction of structural complexity, both in GNN and non-graph inductive paradigms. Its empirical gains in generalization, convergence rate, and stability suggest robustness to real-world label imbalance scenarios. Further exploration may establish extensions to heterogeneous, dynamic, or evolving graphs (Fofanah et al., 3 Feb 2026).