The Open Images Dataset V4: Unified image classification, object detection, and visual relationship detection at scale

Published 2 Nov 2018 in cs.CV | (1811.00982v2)

Abstract: We present Open Images V4, a dataset of 9.2M images with unified annotations for image classification, object detection and visual relationship detection. The images have a Creative Commons Attribution license that allows to share and adapt the material, and they have been collected from Flickr without a predefined list of class names or tags, leading to natural class statistics and avoiding an initial design bias. Open Images V4 offers large scale across several dimensions: 30.1M image-level labels for 19.8k concepts, 15.4M bounding boxes for 600 object classes, and 375k visual relationship annotations involving 57 classes. For object detection in particular, we provide 15x more bounding boxes than the next largest datasets (15.4M boxes on 1.9M images). The images often show complex scenes with several objects (8 annotated objects per image on average). We annotated visual relationships between them, which support visual relationship detection, an emerging task that requires structured reasoning. We provide in-depth comprehensive statistics about the dataset, we validate the quality of the annotations, we study how the performance of several modern models evolves with increasing amounts of training data, and we demonstrate two applications made possible by having unified annotations of multiple types coexisting in the same images. We hope that the scale, quality, and variety of Open Images V4 will foster further research and innovation even beyond the areas of image classification, object detection, and visual relationship detection.

Abstract PDF Upgrade to Chat

Citations (1,259)

View on Semantic Scholar

Summary

The paper introduces a unified dataset with over 9.1M images, 30M labels, and 15.4M bounding boxes to enhance multi-task computer vision research.
The dataset’s rigorous quality control and integrated annotations enable effective cross-task learning for classification, detection, and relationship modeling.
Baseline evaluations show significant performance improvements over previous benchmarks, setting a new standard for large-scale vision datasets.

Overview of Open Images Dataset V4

The paper presents the Open Images Dataset (OID) V4, a large-scale dataset designed to advance several key tasks in computer vision, namely image classification, object detection, and visual relationship detection. The dataset exemplifies a considerable enhancement over previous datasets concerning scale, annotation quality, and complexity, making it an invaluable resource for researchers aiming to push the boundaries of deep learning in computer vision.

Key Characteristics of OID V4

OID V4’s massive scale is one of its most salient features. With over 9.1 million images and upwards of 30 million image-level labels, it dwarfs its predecessors in terms of size. The dataset includes 15.4 million bounding boxes for 600 object categories, distributed across 1.9 million images. This quantity of bounding boxes is over 15 times greater than what is available in competing datasets such as COCO and ImageNet, facilitating the development of more sophisticated object detection models.

Another crucial aspect of OID V4 is its unified nature. Annotations for image classification, object detection, and visual relationship detection coexist within the same images. This comprehensive approach permits cross-task training and analysis, fostering an integrated understanding of visual scenes.

Methodology for Data Collection and Annotation

The images in OID V4 were sourced from Flickr, without a predefined set of class names or tags. This strategy allowed for natural class statistics and mitigated design biases inherent in many other datasets. All images are licensed under Creative Commons Attribution (CC-BY), enhancing the applicability and adaptability of models trained on this dataset.

Particular attention was paid to ensuring the geometric accuracy of the bounding boxes and the recall of image-level annotations, vetted through comparisons with expert annotations and consistency checks. Complex images featuring multiple objects were preferentially included, promoting research into visual relationship detection—a task requiring intricate reasoning about image content.

Statistical Insights and Performance

The paper provides a detailed statistical analysis of the dataset, which reveals that the average number of annotated bounding boxes per image is eight, signifying not only the complexity of the images but also the richness of the annotations. Such depth encourages the exploration of more sophisticated object detection methodologies and a finer-grained assessment of detector performance in various contexts.

Baseline Model Performance

Baseline evaluations across several modern models for image classification and object detection were performed. The results illustrate the evolution of model performance relative to increasing training data, establishing OID V4’s value in facilitating substantial improvements in state-of-the-art algorithms. Additionally, the paper reports baseline metrics for visual relationship detection, a testament to the dataset’s broad applicability.

Implications and Future Directions

OID V4’s extensive and versatile annotations can significantly impact theoretical advancements and practical applications in computer vision. The unified dataset paves the way for comprehensive scene understanding, cross-task learning, and multi-tasking models. Future research could focus on leveraging the dataset to create algorithms that can contextually understand and interpret complex visual scenes, potentially crossing the threshold of current performance limitations.

Moreover, as AI continues to evolve, datasets like OID V4 offer a fertile ground for exploring the interplay between different visual tasks, fostering innovation in multi-modal learning, transfer learning, and beyond. The dataset’s openness and adaptability enhance its utility in diverse research and commercial applications, making it a cornerstone for future advancements in computer vision.

In conclusion, the Open Images Dataset V4 stands as a significant contribution to the field, setting a high benchmark for scale, quality, and utility. Its unified framework and extensive annotations support a broad array of computer vision tasks, driving forward the development of more advanced, integrated, and intelligent visual recognition systems.

Markdown Report Issue