WaveRNet: Wavelet-Guided Retinal Segmentation
- WaveRNet is a wavelet-guided frequency learning framework that decomposes retinal images into low- and high-frequency components for robust, domain-generalized segmentation.
- It leverages specialized modules—including the Spectral-guided Domain Modulator, Frequency-Adaptive Domain Fusion, and Hierarchical Mask-Prompt Refiner—to dynamically fuse domain-specific features and refine segmentation masks.
- Empirical evaluations demonstrate superior cross-domain performance with higher Dice scores compared to SAM-based and U-Net approaches under Leave-One-Domain-Out protocols.
WaveRNet is a wavelet-guided frequency learning architecture designed for multi-source domain-generalized retinal vessel segmentation. Developed to address the pervasive challenge of domain shift due to non-uniform illumination, contrast variation, and the need to preserve fine vessel structures, WaveRNet advances over existing @@@@2@@@@ (SAM)-based approaches by incorporating explicit frequency-domain decomposition with adaptive domain fusion and hierarchical mask refinement. Its architecture integrates a Spectral-guided Domain Modulator (SDM), Frequency-Adaptive Domain Fusion (FADF), and a Hierarchical Mask-Prompt Refiner (HMPR) for robust generalization across heterogeneous retinal imaging domains (Wang et al., 9 Jan 2026).
1. Core Architectural Components
WaveRNet processes a retinal image using a series of specialized modules:
- SAM Image Encoder with Adapters: Uses the ViT-B backbone from SAM with lightweight adapters inserted into each transformer block. All SAM weights are frozen. The encoder generates a feature map , with , .
- Spectral-guided Domain Modulator (SDM): Decomposes into low- and high-frequency branches using learnable convolutional "wavelet" layers. These branches are fused and modulated by learnable domain tokens projected by per-domain MLPs, producing domain-specific features .
- Frequency-Adaptive Domain Fusion (FADF): At test time, lacking domain labels, the model computes frequency prototypes for each source domain and for the test image. Cosine similarity between these prototypes yields softmax weights , used to fuse the domain-specific features: .
- Hierarchical Mask-Prompt Refiner (HMPR): Uses a two-stage decoder regime. Stage 1 yields a coarse mask at ; this is re-encoded as a prompt, refined with self-attention, and supplied to Stage 2 to generate a finer mask , eventually upsampled to .
This architecture facilitates explicit disentanglement of frequency-domain information and dynamic domain adaptation during both training and inference (Wang et al., 9 Jan 2026).
2. Frequency Decomposition and Domain Modulation
Classically, a 2D discrete wavelet transform (DWT) produces low-frequency () and high-frequency () components using fixed filters , :
WaveRNet replaces fixed filters with parameterized convolutional branches:
These are fused as:
where is a learnable residual scalar. This preserves both illumination-robust low-frequency content and vessel-boundary high-frequency structures critical for segmentation.
SDM operates by associating each domain with a learnable token , projected via a two-layer MLP:
This token is spatially broadcast and added to to produce . During training, the token corresponding to the sample's domain is used, enabling domain awareness during representation learning (Wang et al., 9 Jan 2026).
3. Frequency-Adaptive Domain Fusion (FADF)
FADF addresses the problem of domain identification at test time, when no domain label is available. After training, per-domain prototypes are computed:
For a test image, features , are extracted. Cosine similarity is measured for each source domain :
Softmax weights (temperature ) yield the fusion coefficients:
This mechanism enables dynamic fusion of domain-adapted features based on frequency-domain similarity, constituting a frequency-driven soft domain selection (Wang et al., 9 Jan 2026).
4. Hierarchical Mask Generation and Refinement
The HMPR module mitigates detail loss observed in SAM's single-stage decoder. The two-stage process proceeds as:
- Stage 1: Decoder receives (spatial+prompt embeddings), producing mask at resolution.
- Prompt Feedback: is re-encoded as prompt via the SAM prompt encoder , followed by self-attention to model long-range dependencies among vessel segments.
- Stage 2: Decoder takes , outputs at , which is bilinearly upsampled to .
Iterative refinement with mask-based prompts and attention stages preserves fine vessel details, including capillaries and small branches commonly lost in direct upsampling workflows (Wang et al., 9 Jan 2026).
5. Optimization and Training Protocols
WaveRNet is trained via a composite objective:
with , . The loss terms are:
- Dice Loss:
- Focal Loss ():
- IoU-prediction MSE: , where is the actual IoU, providing mask-level self-assessment.
Training follows the Leave-One-Domain-Out (LODO) protocol on DRIVE, STARE, CHASE_DB1, and RECOVERY-FA19 datasets. Metrics include Dice, IoU, and F1 (Wang et al., 9 Jan 2026).
6. Empirical Results and Ablation Analyses
WaveRNet exhibits high performance in both in-domain and cross-domain evaluation:
- In-domain: Matches or slightly surpasses state-of-the-art U-Net variants; e.g., Dice ≈ 80.46% (DRIVE).
- LODO cross-domain: U-Net variants degrade (Dice ≈ 25–40%), while SAM-based fine-tuning approaches plateau around 31–55%. WaveRNet achieves avg Dice ≈ 69.5% (best next, SAM-Med2D-FT, ≈ 60.1%). For individual domains under LODO: DRIVE (78.55%), STARE (81.06%), CHASE_DB1 (76.58%), RECOVERY-FA19 (41.75%).
Ablations validate the contributions of core modules:
- SDM provides the largest single gain under LODO (+6.79% Dice).
- Both low- and high-frequency branches are essential; omitting either degrades LODO performance substantially.
- HMPR is effective only when coupled with frequency-domain adaptation, supporting the necessity of joint frequency segmentation and hierarchical refinement.
7. Implementation Specifics
The model is implemented in PyTorch 2.1.0 with CUDA 12.8 and mixed-precision training on an NVIDIA RTX 4070 Ti GPU. Typical settings include batch size 2, 100 epochs, Adam optimizer (learning rate , exponential decay $0.98$ per epoch), and initialization parameters (, , domain-token std=0.02). Image preprocessing replicates the official SAM pipeline (Wang et al., 9 Jan 2026).
The modular design and empirical validation of WaveRNet establish it as a reference method for robust domain-generalized segmentation in retinal imaging, providing a foundation for future frequency-based domain adaptation and fine-structure preservation strategies in medical image analysis.