CityPersons: A Diverse Dataset for Pedestrian Detection

Published 19 Feb 2017 in cs.CV | (1702.05693v1)

Abstract: Convnets have enabled significant progress in pedestrian detection recently, but there are still open questions regarding suitable architectures and training data. We revisit CNN design and point out key adaptations, enabling plain FasterRCNN to obtain state-of-the-art results on the Caltech dataset. To achieve further improvement from more and better data, we introduce CityPersons, a new set of person annotations on top of the Cityscapes dataset. The diversity of CityPersons allows us for the first time to train one single CNN model that generalizes well over multiple benchmarks. Moreover, with additional training with CityPersons, we obtain top results using FasterRCNN on Caltech, improving especially for more difficult cases (heavy occlusion and small scale) and providing higher localization quality.

Abstract PDF Upgrade to Chat

Citations (766)

View on Semantic Scholar

Summary

The paper introduces the CityPersons dataset, offering over 35,000 pedestrian annotations from 27 cities to enhance detection diversity.
The paper modifies the Faster R-CNN architecture with techniques like input up-scaling and quantized RPN scales to handle small and occluded pedestrians.
The paper demonstrates significant performance gains, achieving a 5.1% miss rate on the Caltech benchmark and improving cross-dataset generalization.

CityPersons: A Diverse Dataset for Pedestrian Detection

The paper "CityPersons: A Diverse Dataset for Pedestrian Detection" by Shanshan Zhang, Rodrigo Benenson, and Bernt Schiele, addresses critical gaps in the field of pedestrian detection. The authors identify a need for diverse training data and rigorously adapted convolutional neural network (CNN) architectures to advance the state-of-the-art performance in pedestrian detection tasks. Their contributions include the introduction of the CityPersons dataset and significant adaptations to the Faster R-CNN architecture, leading to improved detection results, particularly under challenging conditions such as heavy occlusions and varying scales.

Dataset Introduction

CityPersons is an extension of the Cityscapes dataset, consisting of high-quality bounding box annotations for pedestrians spread across images from multiple cities. This dataset spans 27 cities, encompassing different seasons and weather conditions, and includes approximately 35,000 annotated individuals and 13,000 ignore regions. The dataset is designed to provide better generalization in pedestrian detection tasks by capturing a wider array of environmental and contextual variations compared to previous datasets like Caltech-USA and KITTI.

Key Contributions

CityPersons Dataset: The authors provide bounding box annotations for pedestrians on top of the Cityscapes dataset's existing semantic segmentation annotations. This new dataset enables a more robust evaluation of pedestrian detection algorithms due to its higher diversity and larger volume of annotations. By integrating difficult cases such as heavy occlusion and small-scale pedestrians, CityPersons challenges existing models to improve their detection robustness.
Adaptation of Faster R-CNN: The authors propose several modifications to the standard Faster R-CNN to enhance its performance for pedestrian detection. These modifications include:
- Quantizing RPN scales for better handling of small objects.
- Input up-scaling to improve the match with pre-trained models.
- Reducing feature stride to improve localization for small objects.
- Handling ignore regions during the training phase to avoid introducing confusing samples.
- Switching to the Adam solver for more consistent training.

Experimental Results

The adapted Faster R-CNN, when pre-trained on CityPersons and fine-tuned on Caltech, achieves notable performance improvements:

On the Caltech dataset, the adapted model demonstrates a competitive MR (miss rate) of 5.1% at IoU 0.50, outperforming previous state-of-the-art detectors.
When evaluating smaller scale and heavily occluded cases, the gains are even more pronounced, indicating that CityPersons pre-training significantly enhances the model's robustness in challenging scenarios.

Generalization and Implications

The CityPersons dataset demonstrates strong generalization capabilities across multiple pedestrian detection benchmarks, including INRA, ETH, and Tud-Brussels. Faster R-CNN models trained on CityPersons exhibit better generalization than those trained on single-city datasets like Caltech or KITTI. This evidence suggests that diverse and richly annotated datasets are crucial for developing robust pedestrian detection systems capable of performing well across various environments and conditions.

Furthermore, the dataset's richness in terms of diversity and volume directly contributes to improved localization quality and handling of occlusions. The detailed analysis also highlights that properly aligned bounding box annotations and effective handling of ignore regions can significantly boost detection performance.

Future Developments

The availability of the CityPersons dataset opens avenues for exploring various modalities and leveraging additional contextual information:

Incorporating semantic labels, as demonstrated in preliminary trials, shows potential for further boosting detection performance, particularly for small scale pedestrians.
The rich annotations in CityPersons can serve as a benchmark to test novel model architectures and training strategies aimed at improving pedestrian detection under real-world conditions.

Conclusion

CityPersons represents a significant advancement in providing a diverse, high-quality dataset for pedestrian detection. The adaptations to Faster R-CNN underscore the importance of model-specific modifications to tackle specialized detection tasks effectively. This work not only sets a new benchmark for pedestrian detection but also provides a robust foundation for future research exploring advanced CNN architectures and cross-dataset generalization.

Markdown Report Issue