Adaptive NMS: Refining Pedestrian Detection in a Crowd

Published 7 Apr 2019 in cs.CV | (1904.03629v1)

Abstract: Pedestrian detection in a crowd is a very challenging issue. This paper addresses this problem by a novel Non-Maximum Suppression (NMS) algorithm to better refine the bounding boxes given by detectors. The contributions are threefold: (1) we propose adaptive-NMS, which applies a dynamic suppression threshold to an instance, according to the target density; (2) we design an efficient subnetwork to learn density scores, which can be conveniently embedded into both the single-stage and two-stage detectors; and (3) we achieve state of the art results on the CityPersons and CrowdHuman benchmarks.

Abstract PDF Upgrade to Chat

Citations (262)

View on Semantic Scholar

Summary

The paper proposes adaptive-NMS, an algorithm with a dynamic suppression threshold based on local density to improve pedestrian detection accuracy in crowded scenes.
A dedicated density prediction subnetwork is introduced that integrates seamlessly into existing object detection architectures.
Adaptive-NMS demonstrates enhanced performance over conventional methods on CityPersons and CrowdHuman benchmarks, achieving lower Miss Rates under dense conditions.

Adaptive Non-Maximum Suppression for Pedestrian Detection in Crowded Scenes

The research paper titled "Adaptive NMS: Refining Pedestrian Detection in a Crowd" presents an innovative approach to pedestrian detection when dealing with densely populated scenarios. Acknowledging the limitations of traditional Non-Maximum Suppression (NMS) methods in such environments, the authors propose a new algorithm, termed adaptive-NMS, that offers dynamic thresholding based on the perceived density of pedestrian clusters.

This paper primarily addresses the challenge of overlapping detections in crowded environments where conventional NMS methods either lead to excessive false positives or unjustly suppress true detections of closely placed individuals. The proposed methodology introduces an adaptive suppression threshold instead of a fixed value, making detection systems more resilient to the high-density pedestrian layout typically found in places like airports or urban streetscapes.

Key Contributions

Dynamic Suppression Threshold: The authors conceptualize adaptive-NMS to apply a variable suppression threshold determined by the local density of the detected instances, allowing the method to either relax or tighten its suppression criteria based on contextual density.
Density Prediction Subnetwork: A significant addition is the development of a dedicated subnetwork tasked with predicting density scores. This subnetwork serves as a standalone module that can integrate seamlessly into both single-stage and two-stage object detection architectures, enhancing their performance in crowded scenes without necessitating changes to their core structures.
Performance Benchmarking: Adaptive-NMS shows enhanced performance compared to conventional methods like greedy-NMS and soft-NMS on highly regarded benchmarks such as CityPersons and CrowdHuman. Numeric results are compelling, demonstrating a consistently lower Miss Rate (MR) under dense conditions, with a reported 10.8% MR $^{-2}$ on CityPersons and 49.73% MR $^{-2}$ on CrowdHuman.

Implications and Future Direction

The implementation of an adaptive NMS approach holds considerable implications for pedestrian detection applications where high accuracy is crucial, such as automated surveillance, autonomous driving, and advanced robotic systems. By successfully managing the detection of closely packed individuals, such systems can maintain high levels of reliability and robustness, facilitating their integration into complex real-world environments that are dynamically populated.

Theoretical implications extend into the nuanced interpretation of convolutional neural network (CNN) outputs. While classic CNN frameworks prioritize feature discrimination, adaptive-NMS enhances the context-awareness of learned models, thus opening pathways for further exploration into instance-aware processing strategies.

Looking forward, potential avenues for research may involve refining the subnetwork architecture to predict densities with even greater precision and efficiency. Additionally, extending this approach to other overlapping object detection tasks and exploring cross-domain applications could yield significant advancements in high-stakes computer vision tasks. As these systems grow more adept at understanding complex visual environments, adaptive methods like the one proposed stand to shape the frontier of crowd-centric AI solutions.

Markdown Report Issue