Papers
Topics
Authors
Recent
Search
2000 character limit reached

ADD-GCN: Dynamic Graph Convolutional Networks

Updated 19 January 2026
  • ADD-GCN is a dynamic graph neural network that uses attention mechanisms to create image-specific label graphs and multi-level connectomes for Alzheimer's analysis.
  • It integrates static global graphs with sample-specific dynamic graphs to model label dependencies and enhance feature fusion across modalities.
  • Empirical evaluations on benchmarks like MS-COCO and ADNI demonstrate significant performance improvements, proving its effectiveness in both vision and neuroimaging.

The term ADD-GCN refers to two distinct but structurally related architectures for graph neural networks: (1) the Attention-Driven Dynamic Graph Convolutional Network for multi-label image recognition (Ye et al., 2020), and (2) a Multi-Level Generated Connectome-based GCN (MLC-GCN) for Alzheimer's disease analysis, referred to in some contexts as “ADD-GCN” (Zhu et al., 2024). Both models exemplify advances in dynamic graph construction and graph representation learning, tailored to the domain-specific challenges of visual recognition and connectome-based diagnosis, respectively.

1. Architectural Overview

In multi-label image recognition, ADD-GCN (Ye et al., 2020) is an end-to-end framework that decomposes convolutional feature maps into category-aware representations and models label dependencies using both static (global) and adaptive dynamic (image-specific) graphs. In the neuroimaging setting, MLC-GCN (“ADD-GCN”) (Zhu et al., 2024) generates sample-specific graphs (“connectomes”) at multiple representational depths from resting-state fMRI, subsequently aggregating graphical features for disease classification.

Model Graph Construction Application Domain
ADD-GCN Image-specific label graph (attention/dynamic) Multi-label image recognition
MLC-GCN Multi-level subject-specific connectomes Alzheimer's disease (fMRI)

Both leverage dynamic graph definition to overcome the brittleness of global, statistics-driven graph construction and to improve content/subject specificity.

2. Dynamic Graph Generation and Attention Mechanisms

ADD-GCN (Ye et al., 2020):

  • Utilizes a Semantic Attention Module (SAM) which generates CC category-specific attention maps M=[m1,,mC]M = [m_1, \ldots, m_C] by applying a convolutional classifier and sigmoid activation on backbone features.
  • Per-class content-aware representations vcv_c are produced by spatially weighting the feature map XX' with corresponding attention maps.
  • A static GCN computes global label dependencies using a co-occurrence or learned adjacency matrix AsA_s, while a secondary, Per-image Dynamic GCN constructs image-unique adjacency AdA_d via concatenated static node and global pooled features.

MLC-GCN (“ADD-GCN”) (Zhu et al., 2024):

  • Employs a stack of Spatio-Temporal Feature Extractors (STFEs), each combining a transformer encoder (for spatial context) and a DLinear module (for temporal features).
  • For a set of nn fMRI ROI time series, hierarchical features hih_i at KK depths are produced and converted into connectomes via A(i)=hihiTA^{(i)} = h_i h_i^T.
  • Both learned connectomes and the baseline Pearson correlation matrix are encoded with independent GCNs before feature fusion.

This attention-driven or feature-driven dynamic graph synthesis allows both models to adapt graphical structure to content, addressing limitations of static, global graphs.

3. Graph Convolution and Feature Propagation

In multi-label image recognition (Ye et al., 2020):

  • The static and dynamic adjacency matrices guide propagation over content-aware category scores. The static GCN applies a single layer with AsA_s and ReLU nonlinearity.
  • The dynamic adjacency AdA_d is produced by a nodewise concatenation hi=[hi;hg]h'_i = [h_i; h_g] and a 1×1 convolution, producing a dense, image-conditional label affinity graph.
  • One propagation is performed for each GCN (static, then dynamic), with LeakyReLU applied:

H=LeakyReLU(AsVWs),Z=LeakyReLU(AdHWd).H = \text{LeakyReLU}(A_s V W_s), \qquad Z = \text{LeakyReLU}(A_d H W_d).

  • Final per-class scores arise from averaging the output of a per-class classifier on zcz_c and an auxiliary classifier on pooled attention maps.

In connectome-based classification (Zhu et al., 2024):

  • For each generated connectome A(i)A^{(i)}, a two-layer GCN is applied:

hj+1(i)=σ(A^(i)hj(i)Wj(i)),j=0,1h_{j+1}^{(i)} = \sigma\big(\widehat{A}^{(i)} h_{j}^{(i)} W_j^{(i)}\big), \quad j=0,1

with A^(i)=A(i)+I\widehat{A}^{(i)} = A^{(i)} + I and ReLU activation.

  • Embeddings from all levels are concatenated and passed through a multi-layer perceptron and softmax for final diagnosis/classification.

A notable distinction is the fusion at feature or prediction level: image recognition fuses prediction scores, while connectome analysis fuses per-graph embeddings.

4. Training Protocols and Loss Functions

Multi-label Image Recognition ADD-GCN (Ye et al., 2020):

  • Binary cross-entropy is applied independently to each class prediction. No explicit auxiliary losses or regularizers are introduced beyond standard weight decay.
  • Backbone is pre-trained ResNet-101; nonlinearity is LeakyReLU (slope=0.2); data augmentation includes resize/crop/flip; optimizer is SGD with momentum 0.9; learning rates and decay schedules are explicitly described.

MLC-GCN (“ADD-GCN”) (Zhu et al., 2024):

  • Uses categorical cross-entropy on class logits along with an intra-group regularization LgroupL_{\text{group}} which encourages connectomes from the same clinical group to cluster in adjacency space:

Lgroup=1Ki=1Kc=1C1ScuScAu(i)μc(i)22,L_{\mathrm{group}} = \frac{1}{K} \sum_{i=1}^K \sum_{c=1}^C \frac{1}{|S^c|} \sum_{u \in S^c} \|A_u^{(i)} - \mu_c^{(i)}\|_2^2,

where μc(i)\mu_c^{(i)} is the mean adjacency for group cc at level ii.

  • AdamW optimizer, early stopping, dropout, and Mixup data augmentation are employed.
  • Preprocessing follows standard fMRI pipelines (slice timing, realignment, normalization, bandpass filtering).

5. Experimental Results and Comparative Performance

Image Recognition Benchmarks (Ye et al., 2020):

  • MS-COCO: mAP = 85.2% (prior SOTA SSGRL: 83.8%, ML-GCN: 83.0%)
  • VOC2007: mAP = 96.0% (prior SOTA SSGRL: 95.0%, ML-GCN: 94.0%)
  • VOC2012: mAP = 95.5% (prior SOTA SSGRL: 94.8%)
  • Gains are consistent across datasets, with notable improvements over previous label dependency models.

AD and MCI Classification (Zhu et al., 2024):

  • On ADNI, binary classification: MLC-GCN (depth 24) Acc = 95.74 ± 0.90%, AUC = 97.76 ± 2.17 (baseline DABNet/LG-GNN: Acc ≈ 93.4%, AUC ≈ 95.1%).
  • On OASIS-3, multi-class: Acc = 90.56 ± 1.30%, AUC = 94.36 ± 1.24 (baseline: Acc ≈ 89.3%, AUC ≈ 94.1%).
  • Ablation demonstrates the necessity of both temporal and spatial modules as well as the intra-group loss, each contributing 1–3% absolute performance gain.

6. Interpretability, Analysis, and Limitations

Interpretability:

  • In (Ye et al., 2020), dynamic label graphs adapt to each image, reducing spurious correlations and focusing attention in category-wise feature extraction.
  • In (Zhu et al., 2024), the sparsity and anatomical distribution of learned connectomes are analyzed. The strongest connections are observed in prefrontal and temporal lobes, and highly ranked ROIs correspond to known AD-affected regions, including SFG, MFG, IFG, PCL, STG, and MTG. This mapping to neuroscientific biomarkers affirms the biological plausibility of the extracted connectivity patterns.

Limitations:

  • The GCN graph encoder is standard in MLC-GCN; future use of more complex GNN modules (e.g., GAT, InceptionGCN) may enhance performance.
  • Multi-modal integration (structural MRI, PET) is not yet implemented but is straightforward in the multi-stream graph framework.
  • The dynamic dot-product graph generation paradigm may be sensitive to feature scaling; learning explicit sparse or thresholded structures is a potential avenue for future research.
  • Generalizability beyond the evaluated cohorts and domains remains to be comprehensively validated.

7. Context and Outlook

The ADD-GCN paradigm represents a move towards graph neural network architectures that explicitly account for sample-specific relational structure rather than relying on global or static graphs. In computer vision, this leads to robust modeling of label dependencies in images, reducing overfitting to training co-occurrence. In neuroimaging, dynamically-generated multi-level connectomes expand both predictive performance and neuroscientific interpretability. While different in their application scope, both instantiations showcase the benefit of fusing attention or deep hierarchical features with graph-based reasoning, and set a foundation for further developments in dynamic GNNs and adaptive graph construction strategies for structured prediction and diagnosis (Ye et al., 2020, Zhu et al., 2024).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to ADD-GCN.