Towards Label-Efficient Human Matting: A Simple Baseline for Weakly Semi-Supervised Trimap-Free Human Matting
Abstract: This paper presents a new practical training method for human matting, which demands delicate pixel-level human region identification and significantly laborious annotations. To reduce the annotation cost, most existing matting approaches often rely on image synthesis to augment the dataset. However, the unnaturalness of synthesized training images brings in a new domain generalization challenge for natural images. To address this challenge, we introduce a new learning paradigm, weakly semi-supervised human matting (WSSHM), which leverages a small amount of expensive matte labels and a large amount of budget-friendly segmentation labels, to save the annotation cost and resolve the domain generalization problem. To achieve the goal of WSSHM, we propose a simple and effective training method, named Matte Label Blending (MLB), that selectively guides only the beneficial knowledge of the segmentation and matte data to the matting model. Extensive experiments with our detailed analysis demonstrate our method can substantially improve the robustness of the matting model using a few matte data and numerous segmentation data. Our training method is also easily applicable to real-time models, achieving competitive accuracy with breakneck inference speed (328 FPS on NVIDIA V100 GPU). The implementation code is available at \url{https://github.com/clovaai/WSSHM}.
- Supervisely person dataset. https://supervise.ly/explore/projects/supervisely-person-dataset-23304/datasets. Accessed: 2022-09-01.
- Mixmatch: A holistic approach to semi-supervised learning. Advances in neural information processing systems, 32, 2019.
- Disentangled image matting. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 8819–8828, 2019.
- Points as queries: Weakly semi-supervised object detection by points. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8823–8832, 2021.
- Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE transactions on pattern analysis and machine intelligence, 40(4):834–848, 2017.
- Semantic human matting. In Proceedings of the 26th ACM international conference on Multimedia, pages 618–626, 2018.
- A simple framework for contrastive learning of visual representations. In International conference on machine learning, pages 1597–1607. PMLR, 2020.
- Deep image matting with flexible guidance input. arXiv preprint arXiv:2110.10898, 2021.
- Natural image matting using deep convolutional neural networks. In European Conference on Computer Vision, pages 626–643. Springer, 2016.
- Boosting robustness of image matting with context assembling and strong data augmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11707–11716, 2022.
- ONNX Runtime developers. Onnx runtime. https://onnxruntime.ai/, 2021. Version: x.y.z.
- An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
- The pascal visual object classes (voc) challenge. International journal of computer vision, 88(2):303–338, 2010.
- Wssod: A new pipeline for weakly-and semi-supervised object detection. arXiv preprint arXiv:2105.11293, 2021.
- f𝑓fitalic_f, b𝑏bitalic_b, alpha matting. arXiv preprint arXiv:2003.07711, 2020.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
- Context-aware image matting for simultaneous foreground and alpha estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4130–4139, 2019.
- Learning high fidelity depths of dressed humans by watching social media dance videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12753–12762, 2021.
- Modnet: Real-time trimap-free portrait matting via objective decomposition. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 1140–1147, 2022.
- Discriminative region suppression for weakly-supervised semantic segmentation. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 1754–1761, 2021.
- Beyond semantic to instance segmentation: Weakly-supervised instance segmentation via semantic knowledge transfer and self-refinement. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4278–4287, 2022.
- The devil is in the points: Weakly semi-supervised instance segmentation via point-guided mask representation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11360–11370, 2023.
- Anti-adversarially manipulated attributions for weakly and semi-supervised semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4071–4080, 2021.
- Privacy-preserving portrait matting. In Proceedings of the 29th ACM International Conference on Multimedia, pages 3501–3509, 2021a.
- Deep automatic natural image matting. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI-21, pages 800–806, 2021b.
- Bridging composite and real: towards end-to-end deep image matting. International Journal of Computer Vision, 130(2):246–266, 2022.
- Natural image matting via guided contextual attention. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11450–11457, 2020.
- Real-time high-resolution background matting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8762–8771, 2021.
- Robust high-resolution video matting with temporal guidance. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 238–247, 2022.
- Microsoft coco: Common objects in context. In European conference on computer vision, pages 740–755. Springer, 2014.
- Towards enhancing fine-grained details for image matting. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 385–393, 2021a.
- Boosting semantic human matting with coarse annotations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8563–8572, 2020.
- Tripartite information mining and integration for image matting. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7555–7564, 2021b.
- Indices matter: Learning to index for deep image matting. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3266–3275, 2019.
- Alphagan: Generative adversarial networks for natural image matting. arXiv preprint arXiv:1807.10088, 2018.
- Weakly-and semi-supervised learning of a deep convolutional network for semantic image segmentation. In Proceedings of the IEEE international conference on computer vision, pages 1742–1750, 2015.
- Matteformer: Transformer-based image matting via prior-tokens. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11696–11706, 2022.
- Extremec3net: extreme lightweight portrait segmentation networks using advanced c3-modules. arXiv preprint arXiv:1908.03093, 2019.
- Sinet: Extreme lightweight portrait segmentation networks with spatial squeeze module and information blocking decoder. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 2066–2074, 2020.
- Mask-guided matting in the wild. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1992–2001, 2023.
- Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019.
- Attention-guided hierarchical structure aggregation for image matting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13676–13685, 2020.
- U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention, pages 234–241. Springer, 2015.
- Background matting: The world is your green screen. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2291–2300, 2020.
- One-trimap video matting. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXIX, pages 430–448. Springer, 2022.
- Automatic portrait segmentation for image stylization. In Computer Graphics Forum, pages 93–102. Wiley Online Library, 2016a.
- Deep automatic portrait matting. In European conference on computer vision, pages 92–107. Springer, 2016b.
- Semantic image matting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11120–11129, 2021a.
- Deep video matting via spatio-temporal alignment and aggregation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6975–6984, 2021b.
- Learning-based sampling for natural image matting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3055–3063, 2019.
- Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. Advances in neural information processing systems, 30, 2017.
- Improved image matting via real-time user clicks and uncertainty estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15374–15383, 2021.
- Early hierarchical contexts learned by convolutional networks for image segmentation. In 2014 22nd International Conference on Pattern Recognition, pages 1538–1543. IEEE, 2014.
- Deep image matting. In CVPR, 2017.
- Weakly-and semi-supervised object detection with expectation-maximization algorithm. arXiv preprint arXiv:1702.08740, 2017.
- High-resolution deep image matting. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 3217–3224, 2021a.
- Mask guided matting via progressive refinement network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1154–1163, 2021b.
- Group r-cnn for weakly semi-supervised object detection with points. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9417–9426, 2022.
- Portraitnet: Real-time portrait segmentation network for mobile device. Computers & Graphics, 80:104–113, 2019a.
- A late fusion cnn for digital matting. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 7469–7478, 2019b.
- Highly efficient natural image matting. arXiv preprint arXiv:2110.12748, 2021.
- Fast deep matting for portrait animation on mobile phone. In Proceedings of the 25th ACM international conference on Multimedia, pages 297–305, 2017.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.