Papers
Topics
Authors
Recent
Search
2000 character limit reached

Coarse Guidance Network (CGN)

Updated 9 February 2026
  • Coarse Guidance Network (CGN) is a module that injects coarse spatial context into high-resolution patch features to enhance slide-level predictions in MIL frameworks.
  • It remaps instance features to a coarse grid using field-of-view driven binning and processes them through a lightweight convolutional head to compute a guidance map.
  • Empirical evaluations show that incorporating CGNs improves biomarker classification AUCs while maintaining low parameter and computational overhead.

A Coarse Guidance Network (CGN) is a module designed to learn and inject spatial contextual information at a coarser scale into high-magnification instance features within Multiple-Instance Learning (MIL) frameworks for whole-slide image (WSI) analysis. The CGN operates via grid-based remapping of instance features and a lightweight convolutional head to produce a coarse guidance map, which is then used to modulate the instance features before final attention-based aggregation. This approach enables progressive multi-scale context modeling in computational pathology tasks, offering a parameter-efficient mechanism for slide-level prediction enhancement while maintaining computational tractability (Wu et al., 2 Feb 2026).

1. Architectural Overview

The CGN processes high-magnification patch features HRN×DH \in \mathbb{R}^{N \times D} and their normalized spatial coordinates (xn,yn)[0,1]2(x'_n, y'_n)\in[0,1]^2. Its core workflow includes three sequential steps:

  1. Grid-based Remapping: High-magnification features are aggregated into a 3D coarse feature map MRD×H×WM \in \mathbb{R}^{D \times H' \times W'} based on spatial bin assignments determined by a selectable field-of-view (FOV) parameter.
  2. Convolutional Guidance Head: MM is processed by two 3×3 convolutions with ReLU activations and a 1×1 convolution with Sigmoid activation to yield the coarse guidance map PR1×H×WP \in \mathbb{R}^{1 \times H' \times W'}.
  3. Patch-level Gating: PP is flattened and indexed to obtain MARNM_A \in \mathbb{R}^{N}, which gates each corresponding row in HH, resulting in modulated features Hk=HMAH_k = H \odot M_A.

The diagrammatic ASCII representation is:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
H (N×D), coords (N×2)
         │
 ┌───────┴───────────────────────────┐
 │ Grid-based remapping             │
 │ → M ∈ ℝ^{D×H′×W′}, idx∈ℕ^N       │
 └───────┬───────────────────────────┘
         │
 ┌───────┴─────────┐
 │ Conv3×3(D→D′)   │
 │   → ReLU        │
 │ Conv3×3(D′→D′)  │
 │   → ReLU        │
 │ Conv1×1(D′→1)   │
 │   → Sigmoid     │
 └───────┬─────────┘
         │
     P ∈ ℝ^{1×H′×W′}
         │
     flatten & index by idx
         │
     M_A ∈ ℝ^N
         │
     H_k = H ⊙ M_A   (N×D)

2. Grid-based Remapping

Instance features and coordinates are mapped to a coarse grid via field-of-view driven binning. For each instance nn with normalized coordinates %%%%10%%%%, the grid cell assignment (un,vn)(u_n, v_n) is determined as: un=xnW,vn=ynH,idxn=vnW+unu_n = \left\lfloor x'_n W' \right\rfloor,\quad v_n = \left\lfloor y'_n H' \right\rfloor,\quad \mathrm{idx}_n = v_n W' + u_n where W=W/sW' = \lceil W/s \rceil, H=H/sH' = \lceil H/s \rceil, with ss the selected FOV.

The feature map MM is computed by averaging all high-magnification vectors falling into each bin: Mc,i,j=1Gi,jnGi,jhn,c,c=1..DM_{c,i,j} = \frac{1}{|G_{i,j}|} \sum_{n \in G_{i,j}} h_{n,c}, \qquad c=1..D where Gi,jG_{i,j} collects all instances assigned to grid cell (i,j)(i,j). In vectorized notation,

Msum[m,:]=n:idxn=mhn,M[m,:]=Msum[m,:]count[m]M_\text{sum}[m,:] = \sum_{n: \mathrm{idx}_n = m} h_n,\qquad M[m,:] = \frac{M_\text{sum}[m,:]}{\text{count}[m]}

followed by reshaping MM to (D,H,W)(D, H', W').

3. Convolutional Guidance Computation

After remapping, MM is passed through three sequential convolutions: U=ReLU(Conv3×3(M;W1,b1))RD×H×WU = \mathrm{ReLU}(\mathrm{Conv}_{3\times3}(M; W_1, b_1)) \in \mathbb{R}^{D' \times H' \times W'}

V=ReLU(Conv3×3(U;W2,b2))RD×H×WV = \mathrm{ReLU}(\mathrm{Conv}_{3\times3}(U; W_2, b_2)) \in \mathbb{R}^{D' \times H' \times W'}

P=Sigmoid(Conv1×1(V;W3,b3))R1×H×WP = \mathrm{Sigmoid}(\mathrm{Conv}_{1\times1}(V; W_3, b_3)) \in \mathbb{R}^{1 \times H' \times W'}

Here, D=64D'=64 is used as the hidden channel width for all CGN blocks. No self-attention or Transformer module is included; the head is purely convolutional.

PP is flattened to length HWH' \cdot W', and each instance nn gathers its coarse guidance value MA[n]M_A[n] according to its assigned index. The final gated features are Hk=HMAH_k = H \odot M_A.

4. Integration with Attention-based MIL

In standard attention-based MIL settings, instance embeddings HRN×DH \in \mathbb{R}^{N \times D} propagate through an attention aggregator ff to yield slide-level predictions: Y^=φ(f(H))\widehat Y = \varphi(f(H)) Installing a CGN at scale sks_k updates HH as: HH+Hk,Hk=(h1MA[1],...,hNMA[N])H \leftarrow H + H_k,\qquad H_k = (h_1 M_A[1], ..., h_N M_A[N])

Stacking multiple CGNs (for example, at FOVs [1536,2048,3072][1536, 2048, 3072]) results in a progressive series of residual updates: HH+H1;HH+H2;H \leftarrow H+H_1;\qquad H\leftarrow H+H_2;\, \dots The final HmspnH_{\text{mspn}} is then input to attention modules such as ABMIL, DSMIL, CLAM-SB, or CLAM-MB, which conduct the slide-level aggregation.

5. Training Protocol and Hyperparameters

Training details for CGN-based models are as follows:

  • Losses: Biomarker tasks (ER, PR, HER2 status) use cross-entropy loss. Prognosis tasks (CRC Surv) use a negative log-likelihood loss (NLLSurvLoss) that combines censored and uncensored terms:

Yhazard=Sigmoid(f(Hmspn)),Ysurv=t(1Yhazard,t)Y_{\mathrm{hazard}} = \mathrm{Sigmoid}(f(H_{\text{mspn}})),\qquad Y_{\mathrm{surv}} = \prod_t(1-Y_{\mathrm{hazard},t})

Lcensored=logYsurv,Luncensored=logYsurvlogYhazardL_{\mathrm{censored}} = -\log Y_{\mathrm{surv}},\qquad L_{\mathrm{uncensored}} = -\log Y_{\mathrm{surv}} - \log Y_{\mathrm{hazard}}

Lsurv=(1β)Lcensored+βLuncensoredL_{\mathrm{surv}} = (1-\beta) L_{\mathrm{censored}} + \beta L_{\mathrm{uncensored}}

  • Optimizer: AdamW, learning rate 2×1042\times 10^{-4}, cosine-decay scheduler.
  • Early stopping: patience = 10.
  • Epochs: maximum 150.
  • FOV choices: At 20×, e.g., [1536,2048,3072][1536, 2048, 3072] pixels (providing 3 CGNs).
  • Hidden channels: D=64D'=64 per CGN.
  • Parameter and compute cost: Each CGN adds approximately $0.6$M parameters per scale.

6. Empirical Performance and Ablation Results

Empirical studies isolating the CGN demonstrate a consistent benefit on multiple biomarker classification tasks. For instance, a single CGN (FOV=1536) added to ABMIL (using CONCH features) produces:

System ER AUC (%) PR AUC (%) HER2 AUC (%) Params (M) FLOPs (G)
ABMIL w/o CGN (single-scale 20×) 87.22 84.14 80.06
ABMIL + single CGN (FOV=1536) 88.92 84.76 80.84 ~1.51 ~17.7
ABMIL + three CGNs ([1536,2048,3072]) 89.76 85.24 82.86 ~2.18 ~17.7
ABMIL + five CGNs ([1024,1536,2048,2560,3072]) 91.42 84.18 84.62

Adding at least one CGN leads to a clear increase in slide-level AUC—e.g., gains of +1.70pp (ER), +0.62pp (PR), and +0.78pp (HER2) for a single scale. Stacking multiple CGNs for progressive multi-scale guidance further improves performance (e.g., +4.20pp for ER, +4.56pp for HER2). CGNs achieve these gains at reduced parameter and compute cost relative to methods such as concatenation or cross-scale attention schemes ($2.18$M parameters/$17.7$G FLOPs for CGN vs. $2.61$M/$38.4$G for cross-scale alternatives), while delivering larger accuracy improvements (+3.6pp ER, +4.05pp HER2).

7. Summary of Properties

A CGN remaps high-magnification features to a spatially coarse grid, applies a three-layer convolutional head to compute a coarse attention map, reprojects this map back to the patch level to gate the D-dimensional features, and is trained end-to-end via the same MIL objectives. Each CGN block is lightweight (requiring D=64D'=64 hidden channels, 0.6\sim0.6M parameters per scale), incurs minimal additional computation, and has been shown to consistently improve slide-level prediction performance in clinical biomarker and prognosis tasks (Wu et al., 2 Feb 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Coarse Guidance Network (CGN).