CL3AN-GNN: Curriculum-Guided Three-Stage Attention Network

Updated 10 February 2026

The paper introduces a novel graph neural network that integrates curriculum-guided feature learning with a three-stage attention mechanism for imbalanced node classification.
It employs a modular architecture with dual GCN/GAT backbones and curriculum-controlled subgraph selection to progressively manage feature complexity.
Empirical evaluations show significant gains (+3–5% improvement) in accuracy, macro-F1, and AUC, reducing minority misclassification from over 30% to under 10%.

Curriculum-Guided Feature Learning and Three-Stage Attention Network (CL3AN-GNN) is a graph neural network (GNN) framework designed to address imbalanced node classification. It integrates curriculum learning with a three-stage attention mechanism (Engage, Enact, Embed) that sequentially organizes feature extraction and aggregation. The model systematically progresses from structurally simple to complex graph patterns, with an adaptive curriculum-aligned loss that promotes stable gradient flow and robust representation of minority class nodes. Empirical evaluations across standard benchmarks show significant gains in accuracy, macro-F1, and AUC compared to state-of-the-art baselines (Fofanah et al., 3 Feb 2026).

1. Architectural Overview

CL3AN-GNN adopts a modular architecture composed of four sequential modules: feature extraction (dual GCN/GAT backbone), embedding layer (linear projection and multi-head refinement), curriculum-guided three-stage attention (Engage, Enact, Embed), and a curriculum-weighted classifier. Input to the model is a graph $G=(V,E)$ with node features $X$ . Node and edge embeddings are formed using GCN and GAT, projected to lower-dimensional representations $\mathbf{z}_v^{(0)}$ , $\mathbf{z}_{vu}^{(0)}$ . Through $T$ training epochs, CL3AN-GNN dynamically constructs a subgraph $G_t$ at each epoch $t$ based on a monotonically increasing difficulty threshold $\theta_t$ , then applies the three-stage attention pipeline and backpropagates a curriculum-aligned total loss. The final node classification is produced by a softmax function over the last latent node embedding.

The operational workflow is encapsulated in the following high-level loop:

$\mathbf{z}_{vu}^{(0)}$ 3

2. Curriculum-Guided Feature Learning

The curriculum is defined by per-node ( $D_v$ ) and per-edge ( $D_{u,v}$ ) difficulty functions, producing subgraphs $X$ 0 of ascending complexity over training. Simpler features are prioritized—1-hop neighborhood patterns ( $X$ 1), low-degree node attributes (local feature homogeneity), and class-separable node pairs (separated in initial GCN/GAT space). The curriculum threshold evolves as $X$ 2, introducing harder structures in later epochs.

The aggregated node difficulty metric is

$X$ 3

where $X$ 4 quantifies embedding-based separability.

During training, each epoch samples a curriculum subgraph $X$ 5 with $X$ 6 and $X$ 7. This sequential exposure is foundational for stable learning under class imbalance, preventing early overfitting to majority classes.

3. Three-Stage Attention Mechanism

CL3AN-GNN's core novelty is the staged attention mechanism:

Engage Stage: Focuses attention on reliably discriminative, simple features—low-degree, class-homogeneous, and well-separated node pairs—via self-attention with scaled dot products. This step anchors the representation using broad majority signals and clean examples from minority classes.
Enact Stage: Refines attention to encompass medium-difficulty graph patterns such as multi-hop paths and heterophilic connections. Queries are formed from concatenated representations of the node and its first-stage context aggregate. Adjustable attention weights detect cross-type and boundary nodes.
Embed Stage: Consolidates all accumulated signals into a final discriminative embedding yielding robust minority class delineation. This final pass integrates increasingly complex structures, yielding enhanced representation for imbalanced settings.

Mathematically, each stage implements self-attention updates via

$X$ 8

Stage-wise ablation demonstrates additive performance gains, with Engage alone offering substantial improvements, further increased by progressive inclusion of Enact and Embed stages.

4. Loss Function and Optimization

The curriculum-aligned loss is epoch- and class-sensitive, dynamically reweighting classes based on running accuracy:

$X$ 9

where $\mathbf{z}_v^{(0)}$ 0 decreases from $\mathbf{z}_v^{(0)}$ 1 to $\mathbf{z}_v^{(0)}$ 2 over $\mathbf{z}_v^{(0)}$ 3, modulating the influence of class-specific error. The full loss consists of curriculum-scheduled cross-entropy and regularizers:

$\mathbf{z}_v^{(0)}$ 4

with $\mathbf{z}_v^{(0)}$ 5 scheduled to emphasize different loss components across the Engage, Enact, and Embed phases. Regularization includes entropy and diversity losses, and optionally a gradient stability penalty $\mathbf{z}_v^{(0)}$ 6 to mitigate gradient explosion or vanishing.

This loss design explicitly adapts to per-epoch class learning progress, counteracting majority class bias as training progresses.

5. Benchmark Datasets and Evaluation Protocol

CL3AN-GNN is evaluated on eight benchmarks encompassing social, citation, and biological graphs:

Dataset	Nodes	Edges	Classes	Imbalance $\mathbf{z}_v^{(0)}$ 7	Feature Dim	Heterophily $\mathbf{z}_v^{(0)}$ 8
Cora	2,708	5,278	7	0.1–0.8	1,433	0.18
Citeseer	3,327	4,552	6	0.1–0.8	3,703	0.23
PubMed	19,717	88,648	3	0.1–0.8	5,414	0.21
Amazon Photo	7,650	119,081	8	0.1–0.8	745	0.42
Amazon Comp.	13,752	491,722	10	0.1–0.8	767	0.38
Coauthor CS	18,333	163,788	15	0.1–0.8	6,805	0.35
Chameleon	2,277	36,101	5	0.1–0.8	2,325	0.83
OGBN-Arxiv	169,343	1,166,243	40	0.1–0.8	128	0.29

Imbalance is controlled by artificially downsampling minority classes to achieve specific $\mathbf{z}_v^{(0)}$ 9. Experiments use a 70/10/20 split over 10 runs with fixed seeds; metrics reported are accuracy, macro-F1, and AUC-ROC. Hyperparameters—hidden dimensions, learning rate, dropout, weight decay, attention heads, training epochs—are validated per-benchmark.

6. Experimental Results and Analysis

6.1 Quantitative Performance

CL3AN-GNN demonstrates consistent, substantial performance increases over the strongest baselines across all datasets. In representative results:

Method	Cora (ACC/AUC/F1)	Citeseer	PubMed	Amazon Photo	Coauthor CS	Chameleon	OGBN-Arxiv	Amazon Comp.
Best Baseline	0.835/0.973/0.829	0.778/0.928/0.745	0.823/0.963/0.815	0.918/0.939/0.905	0.889/0.986/0.882	0.818/0.923/0.807	0.579/0.907/0.561	0.889/0.988/0.826
CL3AN-GNN	0.950/0.998/0.949	0.907/0.995/0.892	0.877/0.961/0.872	0.929/0.989/0.922	0.953/0.998/0.923	0.855/0.963/0.829	0.598/0.945/0.585	0.916/0.996/0.891

Performance improvements are often in the range of +3–5% in primary metrics.

6.2 Ablation Studies

Stage-wise ablation on Cora, Citeseer, and Amazon Photo datasets demonstrates the incremental impact of each pipeline component. The Engage stage notably improves accuracy, while the progressive addition of Enact and Embed further increase final scores by 3–6% per stage. The full curriculum-guided version provides a final 1–2% improvement.

6.3 Convergence and Stability

Curriculum-guided staged attention leads to more stable training dynamics:

Attention–Gradient correlation increases with each stage ( $\mathbf{z}_{vu}^{(0)}$ 0 in Stage 1, $\mathbf{z}_{vu}^{(0)}$ 1 Stage 2, $\mathbf{z}_{vu}^{(0)}$ 2 Stage 3).
Gradient variance slope decreases from ≈1.23 in Stage 1 (high variance) to ≈0.10 in Stage 3 (very stable).
The staged curriculum approach accelerates convergence and mitigates instability typical in imbalanced GNN training.

6.4 Representation Visualization

Qualitative analyses using t-SNE show minority-class cluster separation becomes more pronounced after the curriculum. Confusion matrices indicate a significant reduction in minority-class misclassification rates, from over 30% to under 10%.

7. Implications and Context

CL3AN-GNN introduces a theoretically grounded methodology for curriculum learning in graph neural architectures by aligning feature exposure and aggregation complexity with model maturity. The sequential, staged attention mechanism is explicitly adapted to the challenges of imbalanced node classification, addressing class skew, training instability, and minority node misclassification.

A plausible implication is that staged curriculum attention, as implemented here, offers a template for future architectures requiring progressive induction of structural complexity, both in GNN and non-graph inductive paradigms. Its empirical gains in generalization, convergence rate, and stability suggest robustness to real-world label imbalance scenarios. Further exploration may establish extensions to heterogeneous, dynamic, or evolving graphs (Fofanah et al., 3 Feb 2026).

Markdown Report Issue Upgrade to Chat

References (1)

Enhancing Imbalanced Node Classification via Curriculum-Guided Feature Learning and Three-Stage Attention Network (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Curriculum-Guided Feature Learning and Three-Stage Attention Network (CL3AN-GNN).

CL3AN-GNN: Curriculum-Guided Three-Stage Attention Network

1. Architectural Overview

2. Curriculum-Guided Feature Learning

3. Three-Stage Attention Mechanism

4. Loss Function and Optimization

5. Benchmark Datasets and Evaluation Protocol

6. Experimental Results and Analysis

6.1 Quantitative Performance

6.2 Ablation Studies

6.3 Convergence and Stability

6.4 Representation Visualization

7. Implications and Context

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

CL3AN-GNN: Curriculum-Guided Three-Stage Attention Network

1. Architectural Overview

2. Curriculum-Guided Feature Learning

3. Three-Stage Attention Mechanism

4. Loss Function and Optimization

5. Benchmark Datasets and Evaluation Protocol

6. Experimental Results and Analysis

6.1 Quantitative Performance

6.2 Ablation Studies

6.3 Convergence and Stability

6.4 Representation Visualization

7. Implications and Context

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research