Narrowing the semantic gaps in U-Net with learnable skip connections: The case of medical image segmentation
Abstract: Most state-of-the-art methods for medical image segmentation adopt the encoder-decoder architecture. However, this U-shaped framework still has limitations in capturing the non-local multi-scale information with a simple skip connection. To solve the problem, we firstly explore the potential weakness of skip connections in U-Net on multiple segmentation tasks, and find that i) not all skip connections are useful, each skip connection has different contribution; ii) the optimal combinations of skip connections are different, relying on the specific datasets. Based on our findings, we propose a new segmentation framework, named UDTransNet, to solve three semantic gaps in U-Net. Specifically, we propose a Dual Attention Transformer (DAT) module for capturing the channel- and spatial-wise relationships to better fuse the encoder features, and a Decoder-guided Recalibration Attention (DRA) module for effectively connecting the DAT tokens and the decoder features to eliminate the inconsistency. Hence, both modules establish a learnable connection to solve the semantic gaps between the encoder and the decoder, which leads to a high-performance segmentation model for medical images. Comprehensive experimental results indicate that our UDTransNet produces higher evaluation scores and finer segmentation results with relatively fewer parameters over the state-of-the-art segmentation methods on different public datasets. Code: https://github.com/McGregorWwww/UDTransNet.
- Deep learning techniques for automatic mri cardiac multi-structures segmentation and diagnosis: is the problem solved? IEEE transactions on medical imaging 37, 2514–2525.
- Swin-unet: Unet-like pure transformer for medical image segmentation. https://arxiv.org/abs/2105.05537.
- Collaborative learning of weakly-supervised domain adaptation for diabetic retinopathy grading on retinal images. Computers in Biology and Medicine 144, 105341.
- TransUNet: Transformers make strong encoders for medical image segmentation. https://arxiv.org/abs/2102.04306.
- An end-to-end approach to segmentation in medical images with cnn and posterior-crf. Medical Image Analysis 76, 102311.
- Skin lesion analysis toward melanoma detection 2018: A challenge hosted by the international skin imaging collaboration (isic). https://arxiv.org/abs/1902.03368.
- An image is worth 16x16 words: Transformers for image recognition at scale, in: Int. Conf. Learn. Repr. (ICLR).
- Inf-net: Automatic COVID-19 lung infection segmentation from CT images. IEEE Transactions on Medical Imaging 39, 2626–2637.
- UTNet: A hybrid transformer architecture for medical image segmentation, in: Proc. Int. Conf. Med. Image Comput. Comput.-Assist. Intervent. (MICCAI), p. 61–71.
- UNETR: Transformers for 3d medical image segmentation, in: WACV, pp. 574–584.
- Metricunet: Synergistic image- and voxel-level learning for precise prostate segmentation via online sampling. Medical Image Analysis 71, 102039.
- MultiResUNet : Rethinking the u-net architecture for multimodal biomedical image segmentation. Neural Netw. 121, 74–87.
- Multi-compound transformer for accurate biomedical image segmentation, in: Proc. Int. Conf. Med. Image Comput. Comput.-Assist. Intervent. (MICCAI), p. 326–336.
- Learning multi-scale synergic discriminative features for prostate image segmentation. Pattern Recognition 126, 108556.
- A dataset and a technique for generalized nuclear segmentation for computational pathology. IEEE Transactions on Medical Imaging 36, 1550–1560.
- 2015 miccai multi-atlas labeling beyond the cranial vault–workshop and challenge. 10.7303/syn3193805.
- Swin transformer: Hierarchical vision transformer using shifted windows, in: Proc. the IEEE/CVF Int. Conf. Comput. Vis. (ICCV), pp. 10012–10022.
- SGDR: Stochastic gradient descent with warm restarts, in: Int. Conf. Learn. Repr. (ICLR).
- Attention u-net: Learning where to look for the pancreas, in: MIDL, pp. 1–10.
- Nenet: Nested efficientnet and adversarial learning for joint optic disc and cup segmentation. Medical Image Analysis 74, 102253.
- Unet#: a unet-like redesigning skip connections for medical image segmentation. arXiv preprint arXiv:2205.11759 .
- U-net: Convolutional networks for biomedical image segmentation, in: Proc. Int. Conf. Med. Image Comput. Comput.-Assist. Intervent. (MICCAI), p. 234–241.
- Gland segmentation in colon histology images: The GlaS challenge contest. Med. Image Anal. 35, 489–502.
- Select, attend, and transfer: light, learnable skip connections, in: Machine Learning in Medical Imaging: 10th International Workshop, MLMI 2019, Held in Conjunction with MICCAI 2019, Shenzhen, China, October 13, 2019, Proceedings 10, Springer. pp. 417–425.
- Instance normalization: The missing ingredient for fast stylization. https://arxiv.org/abs/1607.08022.
- Medical transformer: Gated axial-attention for medical image segmentation, in: Proc. Int. Conf. Med. Image Comput. Comput.-Assist. Intervent. (MICCAI), pp. 36–46.
- Uctransnet: Rethinking the skip connections in u-net from a channel-wise perspective with transformer, in: AAAI.
- Dhc: Dual-debiased heterogeneous co-training framework for class-imbalanced semi-supervised medical image segmentation, in: International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer. pp. 582–591.
- Towards generic semi-supervised framework for volumetric medical image segmentation. arXiv preprint arXiv:2310.11320 .
- Boundary-aware transformers for skin lesion segmentation, in: International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer. pp. 206–216.
- Non-local neural networks, in: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7794–7803.
- Non-local u-nets for biomedical image segmentation, in: Proceedings of the AAAI conference on artificial intelligence, pp. 6315–6322.
- Histoseg: Quick attention with multi-loss function for multi-structure segmentation in digital histology images, in: 2022 12th International Conference on Pattern Recognition Systems (ICPRS), IEEE. pp. 1–7.
- Cbam: convolutional block attention module. in proceedings of the european conference on computer vision (eccv): 3-19.
- Fat-net: Feature adaptive transformers for automated skin lesion segmentation. Medical Image Analysis 76, 102327.
- A multi-branch hybrid transformer networkfor corneal endothelial cell segmentation, in: Proc. Int. Conf. Med. Image Comput. Comput.-Assist. Intervent. (MICCAI), p. 99–108.
- TransFuse: Fusing transformers and CNNs for medical image segmentation, in: Proc. Int. Conf. Med. Image Comput. Comput.-Assist. Intervent. (MICCAI), p. 14–24.
- Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers, in: Proc. Conf. Comput. Vis. Pattern Recognit. (CVPR), p. 6881–6890.
- UNet++: Redesigning skip connections to exploit multiscale features in image segmentation. IEEE Trans. Med. Imag. 39, 1856–1867.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.