Papers
Topics
Authors
Recent
Search
2000 character limit reached

SAGE-UNet: Adaptive Expert Segmentation Model

Updated 30 December 2025
  • The model introduces shape-adapting gated experts that dynamically select between CNN and Transformer modules for input-specific processing.
  • It employs a dual-path fusion strategy with learned gating to balance backbone representations and specialized expert outputs.
  • State-of-the-art performance is demonstrated with Dice scores above 94% on benchmarks like EBHI, DigestPath, and GlaS.

The SAGE-UNet architecture is a dynamically routed, dual-path encoder-decoder segmentation model that introduces Shape-Adapting Gated Experts (SAGE) for input-adaptive computation in heterogeneous visual networks. SAGE-UNet is designed to address the challenges of cellular heterogeneity in medical imaging, particularly for colonoscopic lesion segmentation, by adaptively selecting among a pool of heterogeneous experts (CNNs and Transformers) at every encoder level. Key innovations include a hierarchical gating and selection mechanism, a dual-path fusion strategy, and a Shape-Adapting Hub (SA-Hub) for seamless feature translation between diverse expert modules. The framework achieves state-of-the-art segmentation performance on EBHI, DigestPath, and GlaS medical benchmarks, with Dice scores of 95.57%, 95.16%, and 94.17%, respectively, highlighting its efficacy in robust domain generalization and flexible allocation of computation (Thai et al., 23 Nov 2025).

1. Dual-Path Expert-Backbone Fusion

At the core of SAGE-UNet is the replacement of each static encoder block with a two-path module:

  • Main path (backbone stream): At each encoder layer ii, the forward propagation through the pretrained backbone is preserved as zi(main)=fi(zi1)z_i^{(\mathrm{main})} = f_i(z_{i-1}).
  • Expert path: The same input zi1z_{i-1} is dynamically routed through a selected subset of M=20M=20 expert modules—comprising $4$ shared and $16$ fine-grained experts—based on hierarchical gating, producing an enriched feature zi(expert)z_i^{(\mathrm{expert})}.
  • Dual-path fusion: The layer output is a convex combination:

zi=αizi(main)+(1αi)zi(expert)z_i = \alpha_i\,z_i^{(\mathrm{main})} + (1-\alpha_i)\,z_i^{(\mathrm{expert})}

where αi=σ(θi)\alpha_i = \sigma(\theta_i) is a learned gate. αi1\alpha_i \approx 1 defaults to the backbone, while αi0\alpha_i \approx 0 amplifies expert influence. This mechanism enables SAGE-UNet to fall back to the pretrained backbone in regions requiring standard representations and to invoke experts for fine-grained or globally ambiguous regions.

2. Hierarchical Dynamic Expert Routing

SAGE-UNet employs a two-level, input-adaptive expert selection algorithm:

  • High-level gating: A lightweight gate computes gs=σ(zˉi1Wgate(i)+bgate(i))g_s = \sigma(\bar z_{i-1}W_{\mathrm{gate}}^{(i)} + b_{\mathrm{gate}}^{(i)}), with zˉi1\bar z_{i-1} being the global average pooled input. gsg_s biases the expert selection toward shared (Eshared\mathcal E_{\mathrm{shared}}) or fine-grained (Efine\mathcal E_{\mathrm{fine}}) experts depending on its value.
  • Semantic Affinity Routing (SAR): Computes expert logits Li\mathbf{L}_i via scaled dot-product attention with additive input-dependent noise to promote diversity:

Li=(zˉi1WQ(i))K(i)dk+σnoise(i)ϵ(i),ϵ(i)N(0,I)\mathbf{L}_i = \frac{(\bar z_{i-1}W_Q^{(i)})K^{(i)\top}}{\sqrt{d_k}} + \sigma_{\mathrm{noise}}^{(i)}\odot \boldsymbol\epsilon^{(i)},\,\, \boldsymbol\epsilon^{(i)}\sim\mathcal N(0,I)

  • Logit modulation: The logits are shifted using gsg_s and a binary mask mshared\mathbf m_{\mathrm{shared}}:

Li=Li+msharedlog(gs)+(1mshared)log(1gs)\mathbf{L}_i' = \mathbf{L}_i + \mathbf m_{\mathrm{shared}}\log(g_s) + (1-\mathbf m_{\mathrm{shared}})\log(1-g_s)

  • Top-K selection: The top K=4K=4 experts (per layer) are selected by indices of the KK largest entries of Li\mathbf{L}_i', and their outputs are weighted and combined:

wj={exp((Li)j)kIexp((Li)k),jI 0,otherwisew_j = \begin{cases} \frac{\exp((\mathbf{L}_i')_j)}{\sum_{k\in\mathcal{I}} \exp((\mathbf{L}_i')_k)}, & j\in\mathcal{I}\ 0, & \text{otherwise} \end{cases}

where I=TopKIndices(Li,K)\mathcal{I} = \text{TopKIndices}(\mathbf{L}_i', K). This routing enables the model to adaptively select experts specialized for current input structure and semantics (Thai et al., 23 Nov 2025).

3. Shape-Adapting Hub for Heterogeneous Expert Integration

The SA-Hub facilitates translation between feature representations expected by diverse experts (CNN and Transformer):

  • Input adapter SinS_\mathrm{in}: Transforms zi1z_{i-1} into the expert-specific input space, through reshaping, patchifying, or projection: z~i1(k)=Sin(zi1;ek)\tilde z_{i-1}^{(k)} = S_\mathrm{in}(z_{i-1}; e_k).
  • Expert execution: Expert eke_k computes its output ui(k)=ek(z~i1(k))u_i^{(k)} = e_k(\tilde z_{i-1}^{(k)}).
  • Output adapter SoutS_\mathrm{out}: Projects the expert output back to the backbone-compatible space: z^i(k)=Sout(ui(k),zi(main))\hat z_i^{(k)} = S_\mathrm{out}(u_i^{(k)}, z_i^{(\mathrm{main})}).
  • Expert path fusion: The overall expert feature is the weighted sum of selected experts:

zi(expert)=kIwkz^i(k)z_i^{(\mathrm{expert})} = \sum_{k\in\mathcal{I}} w_k\,\hat z_i^{(k)}

This approach ensures compatibility among experts with disparate architectures and input-output formats, removing the need for excessive manual tuning when incorporating heterogeneous modules (Thai et al., 23 Nov 2025).

4. Architectural Integration within the UNet Framework

SAGE-UNet maintains the canonical U-Net encoder-decoder structure, with specific modifications to the encoder:

  • Stem: The input XX is processed via an initial stem to obtain z0=Stem(X)z_0 = \mathsf{Stem}(X).
  • Encoder: For encoder depths i=1,,Ti = 1,\ldots, T, each block implements the dual-path SAGE module, collecting features ziz_i at each scale.
  • Skip-connections: Multiscale encoder outputs ziz_i are forwarded to the decoder for spatially-resolved fusion.
  • Decoder: The decoder utilizes standard U-Net upsampling and concatenation operations, fusing skip-connected features for refined spatial localization.
  • Segmentation head: Pixel-wise prediction is performed by the final head on the decoder output.

Within any encoder stage, the selected experts can comprise CNN or Transformer architectures depending on the input, dynamically balancing local and global feature extraction. This design enables SAGE-UNet to flexibly adapt capacity allocation and computational routing according to input complexity (Thai et al., 23 Nov 2025).

5. Hyperparameter and Configuration Summary

The main configuration parameters are as follows:

Parameter Value/Description
Total experts MM 20
Shared experts 4
Fine-grained experts 16
Top-K per layer 4
Channel dims DD As in ConvNeXt/ViT: 96,192,384,768
Query/key dim dkd_k 64 or 128
Expert type (heterogeneous) CNN and Transformer

Gating between paths and among experts is implemented using soft sigmoid gates and Top-K thresholding. These design details are selected to optimize segmentation efficiency and adaptivity across scales and visual complexities (Thai et al., 23 Nov 2025).

6. Adaptivity for Local-Global Feature Balancing

SAGE-UNet is designed to dynamically allocate focus based on spatial and semantic complexity:

  • In early, shallow layers associated with local pattern extraction (edges, textures), the high-level gate gsg_s is learned to be large, biasing selection toward shared CNN experts.
  • In deeper layers (typically Transformer-based), gsg_s approaches 0.5, promoting a blend of shared/global and fine-grained/context-aware experts.
  • The semantic affinity routing logits, modulated by gsg_s, ensure optimal selection for each spatial context. Dual-path fusion using αi\alpha_i enables the model to interpolate between backbone-like and expert-driven representations at each scale.

Segmentation of simple image regions proceeds through the main backbone, whereas complex or ambiguous regions invoke additional computation via experts tailored to either local or global content. This adaptivity underpins the model’s robust generalization to diverse histopathology benchmarks (Thai et al., 23 Nov 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to SAGE-UNet Architecture.