MWSIS: Multimodal Weakly Supervised Instance Segmentation with 2D Box Annotations for Autonomous Driving
Abstract: Instance segmentation is a fundamental research in computer vision, especially in autonomous driving. However, manual mask annotation for instance segmentation is quite time-consuming and costly. To address this problem, some prior works attempt to apply weakly supervised manner by exploring 2D or 3D boxes. However, no one has ever successfully segmented 2D and 3D instances simultaneously by only using 2D box annotations, which could further reduce the annotation cost by an order of magnitude. Thus, we propose a novel framework called Multimodal Weakly Supervised Instance Segmentation (MWSIS), which incorporates various fine-grained label generation and correction modules for both 2D and 3D modalities to improve the quality of pseudo labels, along with a new multimodal cross-supervision approach, named Consistency Sparse Cross-modal Supervision (CSCS), to reduce the inconsistency of multimodal predictions by response distillation. Particularly, transferring the 3D backbone to downstream tasks not only improves the performance of the 3D detectors, but also outperforms fully supervised instance segmentation with only 5% fully supervised annotations. On the Waymo dataset, the proposed framework demonstrates significant improvements over the baseline, especially achieving 2.59% mAP and 12.75% mAP increases for 2D and 3D instance segmentation tasks, respectively. The code is available at https://github.com/jiangxb98/mwsis-plugin.
- A Dataset for Semantic Segmentation of Point Cloud Sequences. ArXiv preprint, abs/1904.01416.
- nuScenes: A Multimodal Dataset for Autonomous Driving. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020, 11618–11628. IEEE.
- ScribbleSeg: Scribble-based Interactive Image Segmentation. ArXiv preprint, abs/2303.11320.
- Semi-Supervised Semantic Segmentation With Cross Pseudo Supervision. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19-25, 2021, 2613–2622. Computer Vision Foundation / IEEE.
- Pointly-Supervised Instance Segmentation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022, 2607–2616. IEEE.
- Per-Pixel Classification is Not All You Need for Semantic Segmentation. In Ranzato, M.; Beygelzimer, A.; Dauphin, Y. N.; Liang, P.; and Vaughan, J. W., eds., Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, 17864–17875.
- Box2Mask: Weakly Supervised 3D Semantic Instance Segmentation using Bounding Boxes. In European Conference on Computer Vision.
- Object Counting and Instance Segmentation With Image-Level Supervision. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019, 12397–12405. Computer Vision Foundation / IEEE.
- RWSeg: Cross-graph Competing Random Walks for Weakly Supervised 3D Instance Segmentation. ArXiv preprint, abs/2208.05110.
- Fully sparse 3d object detection. Advances in Neural Information Processing Systems, 35: 351–363.
- Label-PEnet: Sequential Label Propagation and Enhancement Networks for Weakly Supervised Instance Segmentation. In 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27 - November 2, 2019, 3344–3353. IEEE.
- Are we ready for autonomous driving? The KITTI vision benchmark suite. In 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, June 16-21, 2012, 3354–3361. IEEE Computer Society.
- Mask R-CNN. In IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, 2980–2988. IEEE Computer Society.
- A new Direct Connected Component Labeling and Analysis Algorithms for GPUs. In 2018 Conference on Design and Architectures for Signal and Image Processing (DASIP). Porto, Portugal.
- Weakly Supervised Instance Segmentation using the Bounding Box Tightness Prior. In Wallach, H. M.; Larochelle, H.; Beygelzimer, A.; d’Alché-Buc, F.; Fox, E. B.; and Garnett, R., eds., Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, 6582–6593.
- SQN: Weakly-Supervised Semantic Segmentation of Large-Scale 3D Point Clouds with 1000x Fewer Labels. In European Conference on Computer Vision.
- PointGroup: Dual-Set Point Grouping for 3D Instance Segmentation. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020, 4866–4875. IEEE.
- Guided Collaborative Training for Pixel-wise Semi-Supervised Learning. ArXiv preprint, abs/2008.05258.
- Segment Anything. ArXiv preprint, abs/2304.02643.
- Weakly Supervised Segmentation of Small Buildings with Point Labels. In 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10-17, 2021, 7386–7395. IEEE.
- BBAM: Bounding Box Attribution Map for Weakly Supervised Semantic and Instance Segmentation. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19-25, 2021, 2643–2652. Computer Vision Foundation / IEEE.
- Lwsis: Lidar-guided weakly supervised instance segmentation for autonomous driving. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, 1433–1441.
- Multimodal Transformer for Automatic 3D Annotation and Object Detection. In European Conference on Computer Vision.
- MAP-Gen: An Automated 3D-Box Annotation Flow with Multimodal Attention Point Generator. 2022 26th International Conference on Pattern Recognition (ICPR), 1148–1155.
- Waymo Open Dataset: Panoramic Video Panoptic Segmentation. ArXiv preprint, abs/2206.07704.
- WeakM3D: Towards Weakly Supervised Monocular 3D Object Detection. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net.
- Weakly supervised 3d object detection from point clouds. In Proceedings of the 28th ACM International Conference on Multimedia, 4144–4152.
- Parallel Detection-and-Segmentation Learning for Weakly Supervised Instance Segmentation. In 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10-17, 2021, 8178–8188. IEEE.
- From points to parts: 3d object detection from point cloud with part-aware and part-aggregation network. IEEE transactions on pattern analysis and machine intelligence, 43(8): 2647–2664.
- Scalability in Perception for Autonomous Driving: Waymo Open Dataset. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020, 2443–2451. IEEE.
- Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. In Guyon, I.; von Luxburg, U.; Bengio, S.; Wallach, H. M.; Fergus, R.; Vishwanathan, S. V. N.; and Garnett, R., eds., Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, 1195–1204.
- Conditional convolutions for instance segmentation. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part I 16, 282–298. Springer.
- FCOS: Fully Convolutional One-Stage Object Detection. In 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27 - November 2, 2019, 9626–9635. IEEE.
- BoxInst: High-Performance Instance Segmentation With Box Annotations. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19-25, 2021, 5443–5452. Computer Vision Foundation / IEEE.
- Scribble-Supervised LiDAR Semantic Segmentation. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2687–2697.
- Weakly-Supervised Instance Segmentation via Class-Agnostic Learning With Salient Images. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19-25, 2021, 10225–10235. Computer Vision Foundation / IEEE.
- Multi-Path Region Mining for Weakly Supervised 3D Semantic Segmentation on Point Clouds. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020, 4383–4392. IEEE.
- FGR: Frustum-Aware Geometric Reasoning for Weakly Supervised 3D Vehicle Detection. 2021 IEEE International Conference on Robotics and Automation (ICRA), 4348–4354.
- ScribbleVC: Scribble-supervised Medical Image Segmentation with Vision-Class Embedding. ArXiv preprint, abs/2307.16226.
- 3D Instances as 1D Kernels. In European Conference on Computer Vision.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.