DART: Depth-Enhanced Accurate and Real-Time Background Matting
Abstract: Matting with a static background, often referred to as ``Background Matting" (BGM), has garnered significant attention within the computer vision community due to its pivotal role in various practical applications like webcasting and photo editing. Nevertheless, achieving highly accurate background matting remains a formidable challenge, primarily owing to the limitations inherent in conventional RGB images. These limitations manifest in the form of susceptibility to varying lighting conditions and unforeseen shadows. In this paper, we leverage the rich depth information provided by the RGB-Depth (RGB-D) cameras to enhance background matting performance in real-time, dubbed DART. Firstly, we adapt the original RGB-based BGM algorithm to incorporate depth information. The resulting model's output undergoes refinement through Bayesian inference, incorporating a background depth prior. The posterior prediction is then translated into a "trimap," which is subsequently fed into a state-of-the-art matting algorithm to generate more precise alpha mattes. To ensure real-time matting capabilities, a critical requirement for many real-world applications, we distill the backbone of our model from a larger and more versatile BGM network. Our experiments demonstrate the superior performance of the proposed method. Moreover, thanks to the distillation operation, our method achieves a remarkable processing speed of 33 frames per second (fps) on a mid-range edge-computing device. This high efficiency underscores DART's immense potential for deployment in mobile applications}
- X. Shen, X. Tao, H. Gao, C. Zhou, and J. Jia, “Deep automatic portrait matting,” in European Conference on Computer Vision. Springer, 2016, pp. 92–107.
- B. Zhu, Y. Chen, J. Wang, S. Liu, B. Zhang, and M. Tang, “Fast deep matting for portrait animation on mobile phone,” in Proceedings of the 25th ACM international conference on Multimedia, 2017, pp. 297–305.
- Q. Chen, T. Ge, Y. Xu, Z. Zhang, X. Yang, and K. Gai, “Semantic human matting,” in Proceedings of the 26th ACM international conference on Multimedia, 2018, pp. 618–626.
- J. Liu, Y. Yao, W. Hou, M. Cui, X. Xie, C. Zhang, and X.-s. Hua, “Boosting semantic human matting with coarse annotations,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 8563–8572.
- S. Sengupta, V. Jayaram, B. Curless, S. M. Seitz, and I. Kemelmacher-Shlizerman, “Background matting: The world is your green screen,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 2291–2300.
- J. Li, J. Zhang, S. J. Maybank, and D. Tao, “Bridging composite and real: towards end-to-end deep image matting,” International Journal of Computer Vision, pp. 246–266, 2022.
- Z. Ke, J. Sun, K. Li, Q. Yan, and R. W. Lau, “Modnet: Real-time trimap-free portrait matting via objective decomposition,” in Proceedings of the AAAI Conference on Artificial Intelligence, 2022, pp. 1140–1147.
- Y. Sun, C.-K. Tang, and Y.-W. Tai, “Human instance matting via mutual guidance and multi-instance refinement,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 2647–2656.
- Y. Dai, B. Price, H. Zhang, and C. Shen, “Boosting robustness of image matting with context assembling and strong data augmentation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 11 707–11 716.
- S. Ma, J. Li, J. Zhang, H. Zhang, and D. Tao, “Rethinking portrait matting with privacy preserving,” International journal of computer vision, pp. 1–26, 2023.
- J. Li, J. Zhang, and D. Tao, “Deep image matting: A comprehensive survey,” arXiv preprint arXiv:2304.04672, 2023.
- N. Xu, B. Price, S. Cohen, and T. Huang, “Deep image matting,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 2970–2979.
- X. Fang, S.-H. Zhang, T. Chen, X. Wu, A. Shamir, and S.-M. Hu, “User-guided deep human image matting using arbitrary trimaps,” IEEE Transactions on Image Processing, pp. 2040–2052, 2022.
- Y. Sun, C.-K. Tang, and Y.-W. Tai, “Ultrahigh resolution image/video matting with spatio-temporal sparsity,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 14 112–14 121.
- G. Park, S. Son, J. Yoo, S. Kim, and N. Kwak, “Matteformer: Transformer-based image matting via prior-tokens,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 11 696–11 706.
- Q. Liu, H. Xie, S. Zhang, B. Zhong, and R. Ji, “Long-range feature propagating for natural image matting,” in Proceedings of the 29th ACM International Conference on Multimedia, 2021, pp. 526–534.
- Y. Zheng, Y. Yang, T. Che, S. Hou, W. Huang, Y. Gao, and P. Tan, “Image matting with deep gaussian process,” IEEE Transactions on Neural Networks and Learning Systems, 2022.
- Y. Liu, J. Xie, X. Shi, Y. Qiao, Y. Huang, Y. Tang, and X. Yang, “Tripartite information mining and integration for image matting,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 7555–7564.
- S. Lin, A. Ryabtsev, S. Sengupta, B. L. Curless, S. M. Seitz, and I. Kemelmacher-Shlizerman, “Real-time high-resolution background matting,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 8762–8771.
- G. Hinton, O. Vinyals, and J. Dean, “Distilling the knowledge in a neural network,” in NIPS Deep Learning and Representation Learning Workshop, 2015.
- M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “Mobilenetv2: Inverted residuals and linear bottlenecks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 4510–4520.
- K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
- C. Shu, Y. Liu, J. Gao, Z. Yan, and C. Shen, “Channel-wise knowledge distillation for dense prediction,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 5311–5320.
- J. Yao, X. Wang, S. Yang, and B. Wang, “Vitmatte: Boosting image matting with pretrained plain vision transformers,” arXiv preprint arXiv:2305.15272, 2023.
- K. Shen, C. Guo, M. Kaufmann, J. J. Zarate, J. Valentin, J. Song, and O. Hilliges, “X-avatar: Expressive human avatars,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 16 911–16 921.
- X. Chen, Y. Zhu, Y. Li, B. Fu, L. Sun, Y. Shan, and S. Liu, “Robust human matting via semantic guidance,” in Proceedings of the Asian Conference on Computer Vision (ACCV), 2022, pp. 2984–2999.
- S. Migacz, “8-bit inference with tensorrt,” in GPU technology conference, 2017, p. 5.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.