Coarse Guidance Network (CGN)

Updated 9 February 2026

Coarse Guidance Network (CGN) is a module that injects coarse spatial context into high-resolution patch features to enhance slide-level predictions in MIL frameworks.
It remaps instance features to a coarse grid using field-of-view driven binning and processes them through a lightweight convolutional head to compute a guidance map.
Empirical evaluations show that incorporating CGNs improves biomarker classification AUCs while maintaining low parameter and computational overhead.

A Coarse Guidance Network (CGN) is a module designed to learn and inject spatial contextual information at a coarser scale into high-magnification instance features within Multiple-Instance Learning (MIL) frameworks for whole-slide image (WSI) analysis. The CGN operates via grid-based remapping of instance features and a lightweight convolutional head to produce a coarse guidance map, which is then used to modulate the instance features before final attention-based aggregation. This approach enables progressive multi-scale context modeling in computational pathology tasks, offering a parameter-efficient mechanism for slide-level prediction enhancement while maintaining computational tractability (Wu et al., 2 Feb 2026).

1. Architectural Overview

The CGN processes high-magnification patch features $H \in \mathbb{R}^{N \times D}$ and their normalized spatial coordinates $(x'_n, y'_n)\in[0,1]^2$ . Its core workflow includes three sequential steps:

Grid-based Remapping: High-magnification features are aggregated into a 3D coarse feature map $M \in \mathbb{R}^{D \times H' \times W'}$ based on spatial bin assignments determined by a selectable field-of-view (FOV) parameter.
Convolutional Guidance Head: $M$ is processed by two 3×3 convolutions with ReLU activations and a 1×1 convolution with Sigmoid activation to yield the coarse guidance map $P \in \mathbb{R}^{1 \times H' \times W'}$ .
Patch-level Gating: $P$ is flattened and indexed to obtain $M_A \in \mathbb{R}^{N}$ , which gates each corresponding row in $H$ , resulting in modulated features $H_k = H \odot M_A$ .

The diagrammatic ASCII representation is:

$P$ 5

2. Grid-based Remapping

Instance features and coordinates are mapped to a coarse grid via field-of-view driven binning. For each instance $n$ with normalized coordinates $(x'_n, y'_n)\in[0,1]^2$ 0, the grid cell assignment $(x'_n, y'_n)\in[0,1]^2$ 1 is determined as: $(x'_n, y'_n)\in[0,1]^2$ 2 where $(x'_n, y'_n)\in[0,1]^2$ 3, $(x'_n, y'_n)\in[0,1]^2$ 4, with $(x'_n, y'_n)\in[0,1]^2$ 5 the selected FOV.

The feature map $(x'_n, y'_n)\in[0,1]^2$ 6 is computed by averaging all high-magnification vectors falling into each bin: $(x'_n, y'_n)\in[0,1]^2$ 7 where $(x'_n, y'_n)\in[0,1]^2$ 8 collects all instances assigned to grid cell $(x'_n, y'_n)\in[0,1]^2$ 9. In vectorized notation,

$M \in \mathbb{R}^{D \times H' \times W'}$ 0

followed by reshaping $M \in \mathbb{R}^{D \times H' \times W'}$ 1 to $M \in \mathbb{R}^{D \times H' \times W'}$ 2.

3. Convolutional Guidance Computation

After remapping, $M \in \mathbb{R}^{D \times H' \times W'}$ 3 is passed through three sequential convolutions: $M \in \mathbb{R}^{D \times H' \times W'}$ 4

$M \in \mathbb{R}^{D \times H' \times W'}$ 5

$M \in \mathbb{R}^{D \times H' \times W'}$ 6

Here, $M \in \mathbb{R}^{D \times H' \times W'}$ 7 is used as the hidden channel width for all CGN blocks. No self-attention or Transformer module is included; the head is purely convolutional.

$M \in \mathbb{R}^{D \times H' \times W'}$ 8 is flattened to length $M \in \mathbb{R}^{D \times H' \times W'}$ 9, and each instance $M$ 0 gathers its coarse guidance value $M$ 1 according to its assigned index. The final gated features are $M$ 2.

4. Integration with Attention-based MIL

In standard attention-based MIL settings, instance embeddings $M$ 3 propagate through an attention aggregator $M$ 4 to yield slide-level predictions: $M$ 5 Installing a CGN at scale $M$ 6 updates $M$ 7 as: $M$ 8

Stacking multiple CGNs (for example, at FOVs $M$ 9) results in a progressive series of residual updates: $P \in \mathbb{R}^{1 \times H' \times W'}$ 0 The final $P \in \mathbb{R}^{1 \times H' \times W'}$ 1 is then input to attention modules such as ABMIL, DSMIL, CLAM-SB, or CLAM-MB, which conduct the slide-level aggregation.

5. Training Protocol and Hyperparameters

Training details for CGN-based models are as follows:

Losses: Biomarker tasks (ER, PR, HER2 status) use cross-entropy loss. Prognosis tasks (CRC Surv) use a negative log-likelihood loss (NLLSurvLoss) that combines censored and uncensored terms:

$P \in \mathbb{R}^{1 \times H' \times W'}$ 2

$P \in \mathbb{R}^{1 \times H' \times W'}$ 3

$P \in \mathbb{R}^{1 \times H' \times W'}$ 4

Optimizer: AdamW, learning rate $P \in \mathbb{R}^{1 \times H' \times W'}$ 5, cosine-decay scheduler.
Early stopping: patience = 10.
Epochs: maximum 150.
FOV choices: At 20×, e.g., $P \in \mathbb{R}^{1 \times H' \times W'}$ 6 pixels (providing 3 CGNs).
Hidden channels: $P \in \mathbb{R}^{1 \times H' \times W'}$ 7 per CGN.
Parameter and compute cost: Each CGN adds approximately $P \in \mathbb{R}^{1 \times H' \times W'}$ 8M parameters per scale.

6. Empirical Performance and Ablation Results

Empirical studies isolating the CGN demonstrate a consistent benefit on multiple biomarker classification tasks. For instance, a single CGN (FOV=1536) added to ABMIL (using CONCH features) produces:

System	ER AUC (%)	PR AUC (%)	HER2 AUC (%)	Params (M)	FLOPs (G)
ABMIL w/o CGN (single-scale 20×)	87.22	84.14	80.06	—	—
ABMIL + single CGN (FOV=1536)	88.92	84.76	80.84	~1.51	~17.7
ABMIL + three CGNs ([1536,2048,3072])	89.76	85.24	82.86	~2.18	~17.7
ABMIL + five CGNs ([1024,1536,2048,2560,3072])	91.42	84.18	84.62	—	—

Adding at least one CGN leads to a clear increase in slide-level AUC—e.g., gains of +1.70pp (ER), +0.62pp (PR), and +0.78pp (HER2) for a single scale. Stacking multiple CGNs for progressive multi-scale guidance further improves performance (e.g., +4.20pp for ER, +4.56pp for HER2). CGNs achieve these gains at reduced parameter and compute cost relative to methods such as concatenation or cross-scale attention schemes ( $P \in \mathbb{R}^{1 \times H' \times W'}$ 9M parameters/ $P$ 0G FLOPs for CGN vs. $P$ 1M/ $P$ 2G for cross-scale alternatives), while delivering larger accuracy improvements (+3.6pp ER, +4.05pp HER2).

7. Summary of Properties

A CGN remaps high-magnification features to a spatially coarse grid, applies a three-layer convolutional head to compute a coarse attention map, reprojects this map back to the patch level to gate the D-dimensional features, and is trained end-to-end via the same MIL objectives. Each CGN block is lightweight (requiring $P$ 3 hidden channels, $P$ 4M parameters per scale), incurs minimal additional computation, and has been shown to consistently improve slide-level prediction performance in clinical biomarker and prognosis tasks (Wu et al., 2 Feb 2026).

Markdown Report Issue Upgrade to Chat

References (1)

Enabling Progressive Whole-slide Image Analysis with Multi-scale Pyramidal Network (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Coarse Guidance Network (CGN).

Coarse Guidance Network (CGN)

1. Architectural Overview

2. Grid-based Remapping

3. Convolutional Guidance Computation

4. Integration with Attention-based MIL

5. Training Protocol and Hyperparameters

6. Empirical Performance and Ablation Results

7. Summary of Properties

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Coarse Guidance Network (CGN)

1. Architectural Overview

2. Grid-based Remapping

3. Convolutional Guidance Computation

4. Integration with Attention-based MIL

5. Training Protocol and Hyperparameters

6. Empirical Performance and Ablation Results

7. Summary of Properties

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research