Deep Common Feature Mining for Efficient Video Semantic Segmentation
Abstract: Recent advancements in video semantic segmentation have made substantial progress by exploiting temporal correlations. Nevertheless, persistent challenges, including redundant computation and the reliability of the feature propagation process, underscore the need for further innovation. In response, we present Deep Common Feature Mining (DCFM), a novel approach strategically designed to address these challenges by leveraging the concept of feature sharing. DCFM explicitly decomposes features into two complementary components. The common representation extracted from a key-frame furnishes essential high-level information to neighboring non-key frames, allowing for direct re-utilization without feature propagation. Simultaneously, the independent feature, derived from each video frame, captures rapidly changing information, providing frame-specific clues crucial for segmentation. To achieve such decomposition, we employ a symmetric training strategy tailored for sparsely annotated data, empowering the backbone to learn a robust high-level representation enriched with common information. Additionally, we incorporate a self-supervised loss function to reinforce intra-class feature similarity and enhance temporal consistency. Experimental evaluations on the VSPW and Cityscapes datasets demonstrate the effectiveness of our method, showing a superior balance between accuracy and efficiency. The implementation is available at https://github.com/BUAAHugeGun/DCFM.
- Dual correlation network for efficient video semantic segmentation. IEEE Transactions on Circuits and Systems for Video Technology, 2023a.
- Temporal-aware hierarchical mask classification for video semantic segmentation. arXiv preprint arXiv:2309.08020, 2023b.
- Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(12):2481–2495, 2017.
- End-to-end object detection with transformers. In Proceedings of the European Conference on Computer Vision, pages 213–229. Springer, 2020.
- Semantic image segmentation with deep convolutional nets and fully connected crfs. arXiv preprint arXiv:1412.7062, 2014.
- Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4):834–848, 2017a.
- Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587, 2017b.
- Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision, pages 801–818, 2018.
- Masked-attention mask transformer for universal image segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1290–1299, 2022.
- The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3213–3223, 2016.
- An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
- Rethinking bisenet for real-time semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 9716–9725, 2021.
- Semantic video cnns through representation warping. In Proceedings of the IEEE International Conference on Computer Vision, pages 4453–4462, 2017.
- Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 770–778, 2016.
- Temporally distributed networks for fast video semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 8818–8827, 2020.
- Efficient semantic segmentation by altering resolutions for compressed videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22627–22637, 2023.
- Flownet 2.0: Evolution of optical flow estimation with deep networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2462–2470, 2017.
- Accel: A corrective fusion network for efficient semantic segmentation on video. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 8866–8875, 2019.
- Video scene parsing with predictive feature learning. In Proceedings of the IEEE International Conference on Computer Vision, pages 5580–5588, 2017.
- Feature space optimization for semantic video segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3168–3175, 2016.
- Dfanet: Deep feature aggregation for real-time semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 9522–9531, 2019a.
- Partial order pruning: for best speed/accuracy trade-off in neural architecture search. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 9145–9153, 2019b.
- Multi-granularity context network for efficient video semantic segmentation. IEEE Transactions on Image Processing, 2023.
- Polarized self-attention: towards high-quality pixel-wise regression. arXiv preprint arXiv:2107.00782, 2021a.
- Efficient semantic video segmentation with per-frame inference. In Proceedings of the European Conference on Computer Vision, pages 352–368. Springer, 2020.
- Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE international conference on computer vision, pages 10012–10022, 2021b.
- Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3431–3440, 2015.
- Vspw: A large-scale dataset for video scene parsing in the wild. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4133–4143, 2021.
- Semantic video segmentation by gated recurrent flow propagation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 6819–6828, 2018.
- Local memory attention for fast video semantic segmentation. In Proceedings of the IEEE International Conference on Intelligent Robots and Systems, pages 1102–1109. IEEE, 2021a.
- Local memory attention for fast video semantic segmentation. In Proceedings of the European Conference on Computer Vision, pages 1102–1109. IEEE, 2021b.
- U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention, pages 234–241. Springer, 2015.
- Gscnn: a composition of cnn and gibb sampling computational strategy for predicting promoter in bacterial genomes. International Journal of Information Technology, 13:493–499, 2021.
- Clockwork convnets for video semantic segmentation. In Proceedings of the European Conference on Computer Vision Workshops, pages 852–868. Springer, 2016.
- Exploiting temporality for semi-supervised video segmentation. In Proceedings of the IEEE International Conference on Computer Vision Workshops, pages 0–0, 2019.
- Coarse-to-fine feature mining for video semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3126–3137, 2022a.
- Mining relations among cross-frame affinities for video semantic segmentation. In Proceedings of the European Conference on Computer Vision, pages 522–539. Springer, 2022b.
- Attention is all you need. Proceedings of the Advances in Neural Information Processing Systems, 30, 2017.
- Temporal memory attention for video semantic segmentation. In Proceedings of the IEEE International Conference on Image Processing, pages 2254–2258. IEEE, 2021.
- Deep high-resolution representation learning for visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(10):3349–3364, 2020.
- Mask propagation for efficient video semantic segmentation. arXiv preprint arXiv:2310.18954, 2023.
- Unified perceptual parsing for scene understanding. In Proceedings of the European conference on computer vision, pages 418–434, 2018.
- Segformer: Simple and efficient design for semantic segmentation with transformers. Advances in Neural Information Processing Systems, 34:12077–12090, 2021.
- Reliable propagation-correction modulation for video object segmentation. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 2946–2954, 2022.
- Dynamic video segmentation network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 6556–6565, 2018.
- Bisenet: Bilateral segmentation network for real-time semantic segmentation. In Proceedings of the European Conference on Computer Vision, pages 325–341, 2018.
- Segmentation transformer: Object-contextual representations for semantic segmentation. arXiv preprint arXiv:1909.11065, 2019.
- Auxadapt: Stable and efficient test-time adaptation for temporally consistent video semantic segmentation. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision, pages 2339–2348, 2022a.
- Perceptual consistency in video segmentation. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision, pages 2564–2573, 2022b.
- Pyramid scene parsing network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2881–2890, 2017.
- Icnet for real-time semantic segmentation on high-resolution images. In Proceedings of the European Conference on Computer Vision, pages 405–420, 2018.
- Maskflownet: Asymmetric feature matching with learnable occlusion mask. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6278–6287, 2020.
- Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6881–6890, 2021.
- Unet++: A nested u-net architecture for medical image segmentation. In Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, pages 3–11. Springer, 2018.
- Deep feature flow for video recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2349–2358, 2017.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.