Papers
Topics
Authors
Recent
Search
2000 character limit reached

WaveRNet: Wavelet-Guided Retinal Segmentation

Updated 16 January 2026
  • WaveRNet is a wavelet-guided frequency learning framework that decomposes retinal images into low- and high-frequency components for robust, domain-generalized segmentation.
  • It leverages specialized modules—including the Spectral-guided Domain Modulator, Frequency-Adaptive Domain Fusion, and Hierarchical Mask-Prompt Refiner—to dynamically fuse domain-specific features and refine segmentation masks.
  • Empirical evaluations demonstrate superior cross-domain performance with higher Dice scores compared to SAM-based and U-Net approaches under Leave-One-Domain-Out protocols.

WaveRNet is a wavelet-guided frequency learning architecture designed for multi-source domain-generalized retinal vessel segmentation. Developed to address the pervasive challenge of domain shift due to non-uniform illumination, contrast variation, and the need to preserve fine vessel structures, WaveRNet advances over existing @@@@2@@@@ (SAM)-based approaches by incorporating explicit frequency-domain decomposition with adaptive domain fusion and hierarchical mask refinement. Its architecture integrates a Spectral-guided Domain Modulator (SDM), Frequency-Adaptive Domain Fusion (FADF), and a Hierarchical Mask-Prompt Refiner (HMPR) for robust generalization across heterogeneous retinal imaging domains (Wang et al., 9 Jan 2026).

1. Core Architectural Components

WaveRNet processes a retinal image xRH×W×3x \in \mathbb{R}^{H \times W \times 3} using a series of specialized modules:

  • SAM Image Encoder with Adapters: Uses the ViT-B backbone from SAM with lightweight adapters inserted into each transformer block. All SAM weights are frozen. The encoder generates a feature map FRC×H×W\mathbf{F} \in \mathbb{R}^{C \times H'\times W'}, with H=H/16H' = H/16, W=W/16W' = W/16.
  • Spectral-guided Domain Modulator (SDM): Decomposes F\mathbf{F} into low- and high-frequency branches using learnable convolutional "wavelet" layers. These branches are fused and modulated by learnable domain tokens projected by per-domain MLPs, producing domain-specific features FSDM(k)\mathbf{F}^{(k)}_{\rm SDM}.
  • Frequency-Adaptive Domain Fusion (FADF): At test time, lacking domain labels, the model computes frequency prototypes for each source domain and for the test image. Cosine similarity between these prototypes yields softmax weights wkw_k, used to fuse the domain-specific features: Ffused=kwkFSDM(k)\mathbf{F}_{\rm fused} = \sum_k w_k \mathbf{F}_{\rm SDM}^{(k)}.
  • Hierarchical Mask-Prompt Refiner (HMPR): Uses a two-stage decoder regime. Stage 1 yields a coarse mask M256M_{256} at 256×256256 \times 256; this is re-encoded as a prompt, refined with self-attention, and supplied to Stage 2 to generate a finer mask M512M_{512}, eventually upsampled to y^RH×W\hat{y} \in \mathbb{R}^{H \times W}.

This architecture facilitates explicit disentanglement of frequency-domain information and dynamic domain adaptation during both training and inference (Wang et al., 9 Jan 2026).

2. Frequency Decomposition and Domain Modulation

Classically, a 2D discrete wavelet transform (DWT) produces low-frequency (FLLF_{LL}) and high-frequency (FHHF_{HH}) components using fixed filters ϕ\phi, ψ\psi:

FLL=(ϕ(ϕF)T)T,FHH=(ψ(ψF)T)TF_{LL} = (\phi * (\phi * F)^T )^T, \quad F_{HH} = (\psi * (\psi * F)^T )^T

WaveRNet replaces fixed filters with parameterized convolutional branches:

Flow=Wlow(F)=ReLU(Conv3×3(ReLU(Conv3×3(F))))F_{\rm low} = W_{\rm low}(F) = \operatorname{ReLU}(\mathrm{Conv}_{3 \times 3}(\operatorname{ReLU}(\mathrm{Conv}_{3 \times 3}(F))))

Fhigh=Whigh(F)=ReLU(Conv3×3(ReLU(Conv3×3(F))))F_{\rm high} = W_{\rm high}(F) = \operatorname{ReLU}(\mathrm{Conv}_{3 \times 3}(\operatorname{ReLU}(\mathrm{Conv}_{3 \times 3}(F))))

These are fused as:

Fwave=Conv1×1([Flow;Fhigh])+αFF_{\rm wave} = \mathrm{Conv}_{1 \times 1}([F_{\rm low}; F_{\rm high}]) + \alpha \cdot F

where αR\alpha \in \mathbb{R} is a learnable residual scalar. This preserves both illumination-robust low-frequency content and vessel-boundary high-frequency structures critical for segmentation.

SDM operates by associating each domain kk with a learnable token tkRCt_k \in \mathbb{R}^C, projected via a two-layer MLP:

t~k=MLPk(tk)\tilde{t}_k = \mathrm{MLP}_k(t_k)

This token is spatially broadcast and added to FwaveF_{\rm wave} to produce FSDM(k)\mathbf{F}_{\rm SDM}^{(k)}. During training, the token corresponding to the sample's domain is used, enabling domain awareness during representation learning (Wang et al., 9 Jan 2026).

3. Frequency-Adaptive Domain Fusion (FADF)

FADF addresses the problem of domain identification at test time, when no domain label is available. After training, per-domain prototypes are computed:

Flowk=1Nki=1NkGAP(Wlow(Fik))RC\overline{F}_{\rm low}^k = \frac{1}{N_k} \sum_{i=1}^{N_k} \mathrm{GAP}(W_{\rm low}(F_i^k)) \quad \in \mathbb{R}^C

Fhighk=1Nki=1NkGAP(Whigh(Fik))RC\overline{F}_{\rm high}^k = \frac{1}{N_k} \sum_{i=1}^{N_k} \mathrm{GAP}(W_{\rm high}(F_i^k)) \quad \in \mathbb{R}^C

For a test image, features Flowtest\overline{F}_{\rm low}^{\text{test}}, Fhightest\overline{F}_{\rm high}^{\text{test}} are extracted. Cosine similarity is measured for each source domain kk:

sk=12[Flowtest,Flowk+Fhightest,Fhighk]s_k = \frac{1}{2} \left[ \frac{\langle \overline{F}_{\rm low}^{\text{test}}, \overline{F}_{\rm low}^k \rangle}{\|\cdot\| \cdot \|\cdot\|} + \frac{\langle \overline{F}_{\rm high}^{\text{test}}, \overline{F}_{\rm high}^k \rangle}{\|\cdot\| \cdot \|\cdot\|} \right]

Softmax weights (temperature τ\tau) yield the fusion coefficients:

wk=esk/τj=1Kesj/τw_k = \frac{e^{s_k/\tau}}{\sum_{j=1}^K e^{s_j/\tau}}

Ffused=k=1KwkFSDM(k)\mathbf{F}_{\rm fused} = \sum_{k=1}^K w_k \mathbf{F}_{\rm SDM}^{(k)}

This mechanism enables dynamic fusion of domain-adapted features based on frequency-domain similarity, constituting a frequency-driven soft domain selection (Wang et al., 9 Jan 2026).

4. Hierarchical Mask Generation and Refinement

The HMPR module mitigates detail loss observed in SAM's single-stage decoder. The two-stage process proceeds as:

  • Stage 1: Decoder D1D_1 receives (Ffused,Pe)(\mathbf{F}_{\rm fused}, P_e) (spatial+prompt embeddings), producing mask M256M_{256} at 256×256256\times256 resolution.
  • Prompt Feedback: M256M_{256} is re-encoded as prompt PP' via the SAM prompt encoder P()\mathcal{P}(\cdot), followed by self-attention to model long-range dependencies among vessel segments.
  • Stage 2: Decoder D2D_2 takes (Ffused,P)(\mathbf{F}_{\rm fused}, P'), outputs M512M_{512} at 512×512512\times512, which is bilinearly upsampled to y^RH×W\hat{y} \in \mathbb{R}^{H\times W}.

Iterative refinement with mask-based prompts and attention stages preserves fine vessel details, including capillaries and small branches commonly lost in direct upsampling workflows (Wang et al., 9 Jan 2026).

5. Optimization and Training Protocols

WaveRNet is trained via a composite objective:

Ltotal=λ1LDice+λ2LFocal+LMSE\mathcal{L}_{\rm total} = \lambda_1 \mathcal{L}_{\rm Dice} + \lambda_2 \mathcal{L}_{\rm Focal} + \mathcal{L}_{\rm MSE}

with λ1=1.0\lambda_1 = 1.0, λ2=20.0\lambda_2 = 20.0. The loss terms are:

  • Dice Loss: 12(y^y)/(y^+y)1 - 2 \sum (\hat{y} \cdot y)/(\sum \hat{y} + \sum y)
  • Focal Loss (γ=2\gamma=2): y(1y^)γlogy^(1y)y^γlog(1y^)-y (1-\hat{y})^\gamma \log \hat{y} - (1-y)\hat{y}^\gamma \log(1-\hat{y})
  • IoU-prediction MSE: (S^sIoU)2(\hat{S} - s_{\rm IoU})^2, where sIoUs_{\rm IoU} is the actual IoU, providing mask-level self-assessment.

Training follows the Leave-One-Domain-Out (LODO) protocol on DRIVE, STARE, CHASE_DB1, and RECOVERY-FA19 datasets. Metrics include Dice, IoU, and F1 (Wang et al., 9 Jan 2026).

6. Empirical Results and Ablation Analyses

WaveRNet exhibits high performance in both in-domain and cross-domain evaluation:

  • In-domain: Matches or slightly surpasses state-of-the-art U-Net variants; e.g., Dice ≈ 80.46% (DRIVE).
  • LODO cross-domain: U-Net variants degrade (Dice ≈ 25–40%), while SAM-based fine-tuning approaches plateau around 31–55%. WaveRNet achieves avg Dice ≈ 69.5% (best next, SAM-Med2D-FT, ≈ 60.1%). For individual domains under LODO: DRIVE (78.55%), STARE (81.06%), CHASE_DB1 (76.58%), RECOVERY-FA19 (41.75%).

Ablations validate the contributions of core modules:

  • SDM provides the largest single gain under LODO (+6.79% Dice).
  • Both low- and high-frequency branches are essential; omitting either degrades LODO performance substantially.
  • HMPR is effective only when coupled with frequency-domain adaptation, supporting the necessity of joint frequency segmentation and hierarchical refinement.

7. Implementation Specifics

The model is implemented in PyTorch 2.1.0 with CUDA 12.8 and mixed-precision training on an NVIDIA RTX 4070 Ti GPU. Typical settings include batch size 2, 100 epochs, Adam optimizer (learning rate 1×1041 \times 10^{-4}, exponential decay $0.98$ per epoch), and initialization parameters (α=0.1\alpha=0.1, τ=0.5\tau=0.5, domain-token std=0.02). Image preprocessing replicates the official SAM pipeline (Wang et al., 9 Jan 2026).

The modular design and empirical validation of WaveRNet establish it as a reference method for robust domain-generalized segmentation in retinal imaging, providing a foundation for future frequency-based domain adaptation and fine-structure preservation strategies in medical image analysis.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to WaveRNet.