WaveRNet: Wavelet-Guided Retinal Segmentation

Updated 16 January 2026

WaveRNet is a wavelet-guided frequency learning framework that decomposes retinal images into low- and high-frequency components for robust, domain-generalized segmentation.
It leverages specialized modules—including the Spectral-guided Domain Modulator, Frequency-Adaptive Domain Fusion, and Hierarchical Mask-Prompt Refiner—to dynamically fuse domain-specific features and refine segmentation masks.
Empirical evaluations demonstrate superior cross-domain performance with higher Dice scores compared to SAM-based and U-Net approaches under Leave-One-Domain-Out protocols.

WaveRNet is a wavelet-guided frequency learning architecture designed for multi-source domain-generalized retinal vessel segmentation. Developed to address the pervasive challenge of domain shift due to non-uniform illumination, contrast variation, and the need to preserve fine vessel structures, WaveRNet advances over existing @@@@2@@@@ (SAM)-based approaches by incorporating explicit frequency-domain decomposition with adaptive domain fusion and hierarchical mask refinement. Its architecture integrates a Spectral-guided Domain Modulator (SDM), Frequency-Adaptive Domain Fusion (FADF), and a Hierarchical Mask-Prompt Refiner (HMPR) for robust generalization across heterogeneous retinal imaging domains (Wang et al., 9 Jan 2026).

1. Core Architectural Components

WaveRNet processes a retinal image $x \in \mathbb{R}^{H \times W \times 3}$ using a series of specialized modules:

SAM Image Encoder with Adapters: Uses the ViT-B backbone from SAM with lightweight adapters inserted into each transformer block. All SAM weights are frozen. The encoder generates a feature map $\mathbf{F} \in \mathbb{R}^{C \times H'\times W'}$ , with $H' = H/16$ , $W' = W/16$ .
Spectral-guided Domain Modulator (SDM): Decomposes $\mathbf{F}$ into low- and high-frequency branches using learnable convolutional "wavelet" layers. These branches are fused and modulated by learnable domain tokens projected by per-domain MLPs, producing domain-specific features $\mathbf{F}^{(k)}_{\rm SDM}$ .
Frequency-Adaptive Domain Fusion (FADF): At test time, lacking domain labels, the model computes frequency prototypes for each source domain and for the test image. Cosine similarity between these prototypes yields softmax weights $w_k$ , used to fuse the domain-specific features: $\mathbf{F}_{\rm fused} = \sum_k w_k \mathbf{F}_{\rm SDM}^{(k)}$ .
Hierarchical Mask-Prompt Refiner (HMPR): Uses a two-stage decoder regime. Stage 1 yields a coarse mask $M_{256}$ at $256 \times 256$ ; this is re-encoded as a prompt, refined with self-attention, and supplied to Stage 2 to generate a finer mask $M_{512}$ , eventually upsampled to $\hat{y} \in \mathbb{R}^{H \times W}$ .

This architecture facilitates explicit disentanglement of frequency-domain information and dynamic domain adaptation during both training and inference (Wang et al., 9 Jan 2026).

2. Frequency Decomposition and Domain Modulation

Classically, a 2D discrete wavelet transform (DWT) produces low-frequency ( $F_{LL}$ ) and high-frequency ( $F_{HH}$ ) components using fixed filters $\phi$ , $\psi$ :

$F_{LL} = (\phi * (\phi * F)^T )^T, \quad F_{HH} = (\psi * (\psi * F)^T )^T$

WaveRNet replaces fixed filters with parameterized convolutional branches:

$F_{\rm low} = W_{\rm low}(F) = \operatorname{ReLU}(\mathrm{Conv}_{3 \times 3}(\operatorname{ReLU}(\mathrm{Conv}_{3 \times 3}(F))))$

$F_{\rm high} = W_{\rm high}(F) = \operatorname{ReLU}(\mathrm{Conv}_{3 \times 3}(\operatorname{ReLU}(\mathrm{Conv}_{3 \times 3}(F))))$

These are fused as:

$F_{\rm wave} = \mathrm{Conv}_{1 \times 1}([F_{\rm low}; F_{\rm high}]) + \alpha \cdot F$

where $\alpha \in \mathbb{R}$ is a learnable residual scalar. This preserves both illumination-robust low-frequency content and vessel-boundary high-frequency structures critical for segmentation.

SDM operates by associating each domain $k$ with a learnable token $t_k \in \mathbb{R}^C$ , projected via a two-layer MLP:

$\tilde{t}_k = \mathrm{MLP}_k(t_k)$

This token is spatially broadcast and added to $F_{\rm wave}$ to produce $\mathbf{F}_{\rm SDM}^{(k)}$ . During training, the token corresponding to the sample's domain is used, enabling domain awareness during representation learning (Wang et al., 9 Jan 2026).

3. Frequency-Adaptive Domain Fusion (FADF)

FADF addresses the problem of domain identification at test time, when no domain label is available. After training, per-domain prototypes are computed:

$\overline{F}_{\rm low}^k = \frac{1}{N_k} \sum_{i=1}^{N_k} \mathrm{GAP}(W_{\rm low}(F_i^k)) \quad \in \mathbb{R}^C$

$\overline{F}_{\rm high}^k = \frac{1}{N_k} \sum_{i=1}^{N_k} \mathrm{GAP}(W_{\rm high}(F_i^k)) \quad \in \mathbb{R}^C$

For a test image, features $\overline{F}_{\rm low}^{\text{test}}$ , $\overline{F}_{\rm high}^{\text{test}}$ are extracted. Cosine similarity is measured for each source domain $k$ :

$s_k = \frac{1}{2} \left[ \frac{\langle \overline{F}_{\rm low}^{\text{test}}, \overline{F}_{\rm low}^k \rangle}{\|\cdot\| \cdot \|\cdot\|} + \frac{\langle \overline{F}_{\rm high}^{\text{test}}, \overline{F}_{\rm high}^k \rangle}{\|\cdot\| \cdot \|\cdot\|} \right]$

Softmax weights (temperature $\tau$ ) yield the fusion coefficients:

$w_k = \frac{e^{s_k/\tau}}{\sum_{j=1}^K e^{s_j/\tau}}$

$\mathbf{F}_{\rm fused} = \sum_{k=1}^K w_k \mathbf{F}_{\rm SDM}^{(k)}$

This mechanism enables dynamic fusion of domain-adapted features based on frequency-domain similarity, constituting a frequency-driven soft domain selection (Wang et al., 9 Jan 2026).

The HMPR module mitigates detail loss observed in SAM's single-stage decoder. The two-stage process proceeds as:

Stage 1: Decoder $D_1$ receives $(\mathbf{F}_{\rm fused}, P_e)$ (spatial+prompt embeddings), producing mask $M_{256}$ at $256\times256$ resolution.
Prompt Feedback: $M_{256}$ is re-encoded as prompt $P'$ via the SAM prompt encoder $\mathcal{P}(\cdot)$ , followed by self-attention to model long-range dependencies among vessel segments.
Stage 2: Decoder $D_2$ takes $(\mathbf{F}_{\rm fused}, P')$ , outputs $M_{512}$ at $512\times512$ , which is bilinearly upsampled to $\hat{y} \in \mathbb{R}^{H\times W}$ .

Iterative refinement with mask-based prompts and attention stages preserves fine vessel details, including capillaries and small branches commonly lost in direct upsampling workflows (Wang et al., 9 Jan 2026).

5. Optimization and Training Protocols

WaveRNet is trained via a composite objective:

$\mathcal{L}_{\rm total} = \lambda_1 \mathcal{L}_{\rm Dice} + \lambda_2 \mathcal{L}_{\rm Focal} + \mathcal{L}_{\rm MSE}$

with $\lambda_1 = 1.0$ , $\lambda_2 = 20.0$ . The loss terms are:

Dice Loss: $1 - 2 \sum (\hat{y} \cdot y)/(\sum \hat{y} + \sum y)$
Focal Loss ( $\gamma=2$ ): $-y (1-\hat{y})^\gamma \log \hat{y} - (1-y)\hat{y}^\gamma \log(1-\hat{y})$
IoU-prediction MSE: $(\hat{S} - s_{\rm IoU})^2$ , where $s_{\rm IoU}$ is the actual IoU, providing mask-level self-assessment.

Training follows the Leave-One-Domain-Out (LODO) protocol on DRIVE, STARE, CHASE_DB1, and RECOVERY-FA19 datasets. Metrics include Dice, IoU, and F1 (Wang et al., 9 Jan 2026).

6. Empirical Results and Ablation Analyses

WaveRNet exhibits high performance in both in-domain and cross-domain evaluation:

In-domain: Matches or slightly surpasses state-of-the-art U-Net variants; e.g., Dice ≈ 80.46% (DRIVE).
LODO cross-domain: U-Net variants degrade (Dice ≈ 25–40%), while SAM-based fine-tuning approaches plateau around 31–55%. WaveRNet achieves avg Dice ≈ 69.5% (best next, SAM-Med2D-FT, ≈ 60.1%). For individual domains under LODO: DRIVE (78.55%), STARE (81.06%), CHASE_DB1 (76.58%), RECOVERY-FA19 (41.75%).

Ablations validate the contributions of core modules:

SDM provides the largest single gain under LODO (+6.79% Dice).
Both low- and high-frequency branches are essential; omitting either degrades LODO performance substantially.
HMPR is effective only when coupled with frequency-domain adaptation, supporting the necessity of joint frequency segmentation and hierarchical refinement.

7. Implementation Specifics

The model is implemented in PyTorch 2.1.0 with CUDA 12.8 and mixed-precision training on an NVIDIA RTX 4070 Ti GPU. Typical settings include batch size 2, 100 epochs, Adam optimizer (learning rate $1 \times 10^{-4}$ , exponential decay $0.98$ per epoch), and initialization parameters ( $\alpha=0.1$ , $\tau=0.5$ , domain-token std=0.02). Image preprocessing replicates the official SAM pipeline (Wang et al., 9 Jan 2026).

The modular design and empirical validation of WaveRNet establish it as a reference method for robust domain-generalized segmentation in retinal imaging, providing a foundation for future frequency-based domain adaptation and fine-structure preservation strategies in medical image analysis.

Markdown Report Issue Upgrade to Chat

References (1)

WaveRNet: Wavelet-Guided Frequency Learning for Multi-Source Domain-Generalized Retinal Vessel Segmentation (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to WaveRNet.

WaveRNet: Wavelet-Guided Retinal Segmentation

1. Core Architectural Components

2. Frequency Decomposition and Domain Modulation

3. Frequency-Adaptive Domain Fusion (FADF)

4. Hierarchical Mask Generation and Refinement

5. Optimization and Training Protocols

6. Empirical Results and Ablation Analyses

7. Implementation Specifics

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

WaveRNet: Wavelet-Guided Retinal Segmentation

1. Core Architectural Components

2. Frequency Decomposition and Domain Modulation

3. Frequency-Adaptive Domain Fusion (FADF)

4. Hierarchical Mask Generation and Refinement

5. Optimization and Training Protocols

6. Empirical Results and Ablation Analyses

7. Implementation Specifics

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics