- The paper introduces POI, a multiple object tracking framework that significantly improves performance by integrating a novel high-performance detector and discriminative deep learning-based appearance features.
- POI employs a Faster R-CNN based detector trained on diverse datasets and a GoogLeNet based appearance feature trained with triplet loss on person re-ID data.
- Evaluations show POI outperforms state-of-the-art methods, highlighting that superior detection and appearance features are key drivers for achieving high performance in multiple object tracking.
The research paper titled "POI: Multiple Object Tracking with High Performance Detection and Appearance Feature" presents an innovative approach in the field of multiple object tracking (MOT), emphasizing the critical importance of high-performance detection and deep learning-based appearance features. The authors provide a comprehensive analysis, focusing on both online and offline tracking scenarios.
Detection Strategy
The paper highlights the indispensable role of detection results in MOT's data association processes. A novel detector implementation, based on Faster R-CNN and fine-tuned with VGG-16 from ImageNet, is introduced. The training incorporates a blend of diverse datasets including ETHZ and Caltech pedestrian datasets, complemented by a substantial self-collected surveillance dataset. Utilizing multi-scale strategies but constrained to single-scale during testing, the detection optimization significantly reduces false negatives (FN) and false positives (FP), as evidenced by a substantial decrease in FP+FN totals when employing advanced strategies like skip pooling and multi-region feature combination.
Appearance Feature
In MOT, computing the affinity between appearance features is pivotal for effective data association. The study implements a feature extraction technique aligned with GoogLeNet architecture, designed to produce a 128-dimensional feature vector. Crucially, a large dataset consisting of various person re-identification resources, like Market-1501 and CUHK03, is employed, using softmax and triplet loss to enhance feature discrimination and ensure robust identity association through minimized cosine distances.
Online Tracker
The paper details an online tracker utilizing the Kalman filter for predictive motion analysis, alongside the Kuhn-Munkres algorithm for optimized data association. The comprehensive algorithm involves affinity matrix construction with combined motion, shape, and appearance affinities, followed by sophisticated data association strategies, such as dividing tracks into high and low quality based on a set threshold. This method aims to refine association processes and enhance tracking accuracy despite potential detection gaps.
Offline Tracker
In contrast to the online counterpart, the offline tracker builds upon and extends the capabilities of H2T, utilizing K-Dense Neighbors. Emphasizing robustness and efficiency, significant improvements are described including refined appearance representation with CNN derived features, adjustments to handle mixed target scales, and enhanced computational efficiency by reducing affinity matrix dimensions.
Evaluation and Results
Comprehensive comparative evaluations against state-of-the-art methods demonstrate superior performance for the POI trackers, especially in metrics such as MOTA and MOTP. Notably, the offline tracker exhibits marked improvements in minimizing FN, albeit with moderate FP reduction linked to interpolation.
Implications and Future Directions
This research underscores the capability of advanced detection and appearance features to substantially improve MOT methodologies. The findings highlight that, with superior detection and feature extraction, the expected advantage of complex offline trackers over simpler online ones is diminished. These insights pave the way for the development of more sophisticated tracking frameworks leveraging high-quality detection data and robust feature models.
By providing public access to these nuanced detection and re-identification features, the authors facilitate wider research and development efforts within the community, aiming to accelerate advancements in the efficacy and reliability of multiple object tracking systems. Future exploration in this domain may involve further refining feature extraction techniques, improving computational efficiency, and enhancing scalability across varied and complex environments.