Phi-SegNet: Phase-Aware Medical Segmentation
- The paper presents a CNN that leverages Fourier phase priors in both architecture and loss to enhance boundary detection in medical images.
- The Bi-Feature Mask Former and Reverse Fourier Attention modules fuse multi-scale features and apply phase-aware refinement to capture fine spatial details.
- Quantitative evaluations show an average IoU improvement of +1.54% over competitors, confirming robust performance across various imaging modalities.
Phi-SegNet is a @@@@1@@@@ (CNN) architecture designed for medical image segmentation, distinguished by its explicit integration of spectral (Fourier phase) priors into both architectural design and loss function. By leveraging phase-aware supervision, Phi-SegNet aims to improve boundary fidelity and generalization across diverse imaging modalities, addressing limitations of traditional spatial-only encoding schemes prevalent in CNN and Transformer-based segmentation networks (Ali et al., 22 Jan 2026).
1. Architectural Foundations and Motivation
Phi-SegNet adopts an encoder-decoder paradigm with EfficientNet-B4 (pre-trained on ImageNet) serving as its encoder, optimized for robust feature extraction. The decoder pathways are augmented by two critical modules: Bi-Feature Mask Former (BFMF) for multi-scale feature fusion, and Reverse Fourier Attention (RFA) for phase-regularized spectral refinement. Single-channel “boundary masks” are produced at every decoder stage by phase (φ)-conditioners, which serve both as inputs to RFA and direct targets for spectral supervision. The explicit motivation for phase integration lies in the Fourier phase spectrum’s capacity to encode spatial alignment and contour geometry, thus facilitating the learning of sharper, more consistent object boundaries, particularly advantageous for medical images characterized by low contrast and noise.
2. Bi-Feature Mask Former (BFMF) Module
The BFMF module fuses adjacent encoder features to bridge semantic gaps between high-resolution spatial details and lower-resolution semantic context. Specifically:
- Inputs: High-resolution encoder feature ; low-resolution adjacent encoder feature .
- Multi-Kernel Convolutions (MkC): Three parallel convolutions with kernel sizes and dilations are applied to each input.
- Feature Blending:
where indicates channel-wise concatenation and is 2× upsampling.
- Mask Generation:
- Interpretation: This multi-scale fusion encodes fine detail and contextual semantics within the mask features, enhancing sensitivity to boundaries and global structure.
3. Reverse Fourier Attention (RFA) Block
The RFA block is central to spectral supervision and boundary refinement:
- φ-Conditioner Output: Each decoder stage produces a boundary mask .
- Reverse Mask and Spectral Filtering: The complementary mask undergoes 2D Discrete Fourier Transform (), followed by a centered low-pass filter (LPF) with cutoff :
After inverse transform and modulus:
- Attention Application: Decoder features are modulated element-wise:
- Significance: The LPF prioritizes dominant low-frequency structures, denoising edges and guiding the network toward structural form rather than spurious detail.
4. Phase-Aware Supervision and Loss Function
Phi-SegNet employs a dual-component loss to balance spectral coherence with spatial overlap:
- Phase Extraction: For each decoder stage , phase unwrapping is performed:
- Phase Alignment Loss:
where denotes the ground-truth mask.
- Spatial (IoU) Loss:
- Total Loss:
with , ; φ-conditioner outputs create a closed feedback loop with the RFA blocks.
5. Training and Evaluation Procedures
Training employs Adam with initial learning rate , batch size 4, and cosine annealing schedule () over 150 epochs. Image pre-processing consists of resizing to 256×256, random flips, affine transformations, and multi-scale training. Implementation uses PyTorch 2.3.0 and CUDA 12.1 on 4× NVIDIA RTX 2080 Ti GPUs. Evaluation spans five datasets: BUSI (breast US), TDD (dental X-ray), Kvasir-SEG (colon polyps), GLaS (glands), and PROMISE-12 (prostate MRI), with external hold-outs for cross-dataset generalization (UDIAT, Mendeley, CVC-ColonDB).
6. Quantitative Performance and Ablation
Phi-SegNet achieves test IoU and F1 scores:
| Dataset | IoU | F1 |
|---|---|---|
| BUSI | 84.54% | 91.98% |
| TDD | 85.37% | 93.34% |
| Kvasir-SEG | 84.96% | 92.24% |
| GLaS | 83.83% | 91.49% |
| PROMISE-12 | 83.62% | 91.50% |
The average relative improvement over the next best-performing model is IoU +1.54% ± 1.26%, F1 +0.98% ± 0.71%. Unified training across modalities yields merged IoU of 78.97% and F1 of 87.52%. Cross-dataset tests maintain high performance, e.g., IoU of 70.43% on unseen UDIAT data. Ablation analyses confirm that each module—phase loss, BFMF, RFA—contributes consistently to performance gains.
7. Limitations, Insights, and Future Directions
Spectral priors enhance segmentation by encoding geometry and positional cues invariant to contrast fluctuations; RFA blocks denoise high-frequency clutter while retaining structure; phase alignment loss enforces boundary accuracy in frequency space. Current implementation incurs increased parameter count (≈60 M), computational load (≈82 G FLOPs), and uses a fixed LPF cutoff (), which may not be universally optimal. Future work may include lightweight model distillation, adaptive spectral filtering dependent on modality, and semi-supervised or domain-adaptive schemes for enhanced generalization. A plausible implication is that phase-aware supervision could generalize to other tasks requiring fine-grained spatial discrimination in the presence of ambiguous or noisy input (Ali et al., 22 Jan 2026).
In conclusion, Phi-SegNet demonstrates that explicit phase-aware feedback and spectral regularization substantially advance medical image segmentation, enabling robust, modality-agnostic localization even for fine boundaries and structurally complex targets.