- The paper presents BDD100K, a large-scale, diverse driving dataset annotated for ten tasks to benchmark and enhance autonomous driving models.
- The study demonstrates that integrating tasks through multitask learning, including segmentation and tracking, significantly improves model performance over single-task approaches.
- The dataset’s extensive diversity in weather, lighting, and geographic conditions facilitates robust domain adaptation and real-world autonomous driving applications.
BDD100K: A Benchmark for Heterogeneous Multitask Learning in Autonomous Driving
Introduction
The field of computer vision has witnessed significant advancements, primarily driven by large-scale annotated datasets such as ImageNet and COCO. However, existing driving datasets are insufficient in supporting the multifaceted needs of autonomous driving. Researchers often face limitations due to the lack of diverse and rich datasets, thus constraining the exploration of complex multitask learning paradigms.
The BDD100K Dataset
BDD100K aims to bridge these gaps by offering a comprehensive annotated driving video dataset accompanied by extensive benchmarks for ten different tasks. The dataset includes more than 100,000 video clips, covering diverse scenarios including various weather conditions, geographical locations, and times of the day.
Data Collection and Annotation
Data was collected through crowd-sourcing facilitated by Nexar, capturing diverse driving conditions across multiple cities in the US. The dataset is annotated for multiple tasks, including but not limited to, image tagging, lane detection, drivable area segmentation, object detection, semantic segmentation, multiple object tracking (MOT), and multiple object tracking and segmentation (MOTS).
Benchmarks and Experimental Evaluations
Image Tagging
The dataset includes image-level annotations for weather, scene, and time of day, allowing for robust domain adaptation and transfer learning studies. Initial experiments using DLA-34 for classification tasks yielded average accuracies around 50-60%, showcasing a high level of diversity and complexity in the dataset.
Lane Detection and Drivable Area Segmentation
BDD100K offers detailed lane marking annotations and drivable area segmentations. Baseline experiments reveal the model's ability to extrapolate drivable areas even in the absence of clear lane markings. Lane marking evaluation, including attributes such as continuity and direction, shows improvement when jointly trained with drivable area segmentation, particularly for smaller training sets.
Object Detection and Semantic Segmentation
For object detection, Faster-RCNN trained on domain-specific subsets showed significant performance discrepancies particularly between city and non-city scenes as well as daytime and nighttime. Reasonable mean IoUs for semantic segmentation illustrate that models like DRN-D can benefit from the dataset's diversity. Domain differences with existing datasets such as Cityscapes were evident, highlighting the complementary nature of BDD100K.
Multiple Object Tracking (MOT) and Multiple Object Tracking and Segmentation (MOTS)
The tracking benchmark is notable for its large scale, featuring over 3 million bounding boxes. Baseline models trained on the MOT dataset demonstrated high levels of occlusion and re-identification challenges. For MOTS, the BDD100K dataset, integrating bounding boxes from detection and instance segmentation, improved significantly when leveraging additional resources from simpler tasks.
Multitask Learning Insights
The dataset facilitates multitask learning across homogeneous, cascaded, and heterogeneous settings:
- Homogeneous Multitask Learning: Joint training of lane marking and drivable area segmentation showed mutual benefits when training with smaller datasets, indicating the potential of multitask frameworks.
- Cascaded Multitask Learning: Significant improvements were observed in complex tasks like instance segmentation and multiple object tracking when trained jointly with simpler tasks like object detection.
- Heterogeneous Multitask Learning: The ultimate goal of integrating diverse tasks into a single model was explored through cascading and fine-tuning from pre-trained models. Notably, the combination of detection, instance segmentation, and tracking improved the segmentation tracking performance (MOTSA).
Conclusion
BDD100K offers a rich, diverse dataset that significantly advances research in autonomous driving and heterogeneous multitask learning. It provides an invaluable resource for developing and benchmarking algorithms that can generalize well across diverse driving scenarios. Future developments may include exploring annotation strategies and enhancing dataset diversity to cover a broader range of driving conditions. This dataset stands as a testament to the importance of comprehensive data in progressing towards fully autonomous vehicles, capable of handling complex real-world tasks.