CFPFormer: Feature-pyramid like Transformer Decoder for Segmentation and Detection
Abstract: Feature pyramids have been widely adopted in convolutional neural networks and transformers for tasks in medical image segmentation. However, existing models generally focus on the Encoder-side Transformer for feature extraction. We further explore the potential in improving the feature decoder with a well-designed architecture. We propose Cross Feature Pyramid Transformer decoder (CFPFormer), a novel decoder block that integrates feature pyramids and transformers. Even though transformer-like architecture impress with outstanding performance in segmentation, the concerns to reduce the redundancy and training costs still exist. Specifically, by leveraging patch embedding, cross-layer feature concatenation mechanisms, CFPFormer enhances feature extraction capabilities while complexity issue is mitigated by our Gaussian Attention. Benefiting from Transformer structure and U-shaped connections, our work is capable of capturing long-range dependencies and effectively up-sample feature maps. Experimental results are provided to evaluate CFPFormer on medical image segmentation datasets, demonstrating the efficacy and effectiveness. With a ResNet50 backbone, our method achieves 92.02\% Dice Score, highlighting the efficacy of our methods. Notably, our VGG-based model outperformed baselines with more complex ViT and Swin Transformer backbone.
- End-to-End Object Detection with Transformers, May 2020. arXiv:2005.12872 [cs] version: 3.
- TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation, February 2021. arXiv:2102.04306 [cs].
- Bert: Pre-training of deep bidirectional transformers for language understanding, 2019.
- An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale, June 2021. arXiv:2010.11929 [cs].
- Centernet: Keypoint triplets for object detection, 2019.
- CenterNet: Keypoint Triplets for Object Detection, April 2019. arXiv:1904.08189 [cs] version: 3.
- The PASCAL Visual Object Classes Challenge 2007 (VOC2007) Results. http://www.pascal-network.org/challenges/VOC/voc2007/workshop/index.html.
- The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results. http://www.pascal-network.org/challenges/VOC/voc2012/workshop/index.html.
- Deep Residual Learning for Image Recognition, December 2015. arXiv:1512.03385 [cs].
- Sage Bionetworks [email protected]. Synapse | Sage Bionetworks.
- Adam: A method for stochastic optimization, 2017.
- Microsoft COCO: Common Objects in Context, February 2015. arXiv:1405.0312 [cs].
- Swin transformer: Hierarchical vision transformer using shifted windows, 2021.
- Decoupled weight decay regularization, 2019.
- Attention u-net: Learning where to look for the pancreas, 2018.
- U-Net: Convolutional Networks for Biomedical Image Segmentation, May 2015. arXiv:1505.04597 [cs].
- The fully convolutional transformer for medical image segmentation, 2023.
- Attention Is All You Need, August 2023. arXiv:1706.03762 [cs].
- Uctransnet: Rethinking the skip connections in u-net from a channel-wise perspective with transformer, 2022.
- CoTr: Efficiently Bridging CNN and Transformer for 3D Medical Image Segmentation, March 2021. arXiv:2103.03024 [cs].
- Dilated Residual Networks, May 2017. arXiv:1705.09914 [cs].
- Unet++: A nested u-net architecture for medical image segmentation, 2018.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.