POI: Multiple Object Tracking with High Performance Detection and Appearance Feature

Published 19 Oct 2016 in cs.CV | (1610.06136v1)

Abstract: Detection and learning based appearance feature play the central role in data association based multiple object tracking (MOT), but most recent MOT works usually ignore them and only focus on the hand-crafted feature and association algorithms. In this paper, we explore the high-performance detection and deep learning based appearance feature, and show that they lead to significantly better MOT results in both online and offline setting. We make our detection and appearance feature publicly available. In the following part, we first summarize the detection and appearance feature, and then introduce our tracker named Person of Interest (POI), which has both online and offline version.

Abstract PDF Upgrade to Chat

Citations (430)

View on Semantic Scholar

Summary

The paper introduces POI, a multiple object tracking framework that significantly improves performance by integrating a novel high-performance detector and discriminative deep learning-based appearance features.
POI employs a Faster R-CNN based detector trained on diverse datasets and a GoogLeNet based appearance feature trained with triplet loss on person re-ID data.
Evaluations show POI outperforms state-of-the-art methods, highlighting that superior detection and appearance features are key drivers for achieving high performance in multiple object tracking.

Multiple Object Tracking with High-Performance Detection and Deep Learning-Based Appearance Features

The research paper titled "POI: Multiple Object Tracking with High Performance Detection and Appearance Feature" presents an innovative approach in the field of multiple object tracking (MOT), emphasizing the critical importance of high-performance detection and deep learning-based appearance features. The authors provide a comprehensive analysis, focusing on both online and offline tracking scenarios.

Detection Strategy

The paper highlights the indispensable role of detection results in MOT's data association processes. A novel detector implementation, based on Faster R-CNN and fine-tuned with VGG-16 from ImageNet, is introduced. The training incorporates a blend of diverse datasets including ETHZ and Caltech pedestrian datasets, complemented by a substantial self-collected surveillance dataset. Utilizing multi-scale strategies but constrained to single-scale during testing, the detection optimization significantly reduces false negatives (FN) and false positives (FP), as evidenced by a substantial decrease in FP+FN totals when employing advanced strategies like skip pooling and multi-region feature combination.

Appearance Feature

In MOT, computing the affinity between appearance features is pivotal for effective data association. The study implements a feature extraction technique aligned with GoogLeNet architecture, designed to produce a 128-dimensional feature vector. Crucially, a large dataset consisting of various person re-identification resources, like Market-1501 and CUHK03, is employed, using softmax and triplet loss to enhance feature discrimination and ensure robust identity association through minimized cosine distances.

Online Tracker

The paper details an online tracker utilizing the Kalman filter for predictive motion analysis, alongside the Kuhn-Munkres algorithm for optimized data association. The comprehensive algorithm involves affinity matrix construction with combined motion, shape, and appearance affinities, followed by sophisticated data association strategies, such as dividing tracks into high and low quality based on a set threshold. This method aims to refine association processes and enhance tracking accuracy despite potential detection gaps.

Offline Tracker

In contrast to the online counterpart, the offline tracker builds upon and extends the capabilities of H $^2$ T, utilizing K-Dense Neighbors. Emphasizing robustness and efficiency, significant improvements are described including refined appearance representation with CNN derived features, adjustments to handle mixed target scales, and enhanced computational efficiency by reducing affinity matrix dimensions.

Evaluation and Results

Comprehensive comparative evaluations against state-of-the-art methods demonstrate superior performance for the POI trackers, especially in metrics such as MOTA and MOTP. Notably, the offline tracker exhibits marked improvements in minimizing FN, albeit with moderate FP reduction linked to interpolation.

Implications and Future Directions

This research underscores the capability of advanced detection and appearance features to substantially improve MOT methodologies. The findings highlight that, with superior detection and feature extraction, the expected advantage of complex offline trackers over simpler online ones is diminished. These insights pave the way for the development of more sophisticated tracking frameworks leveraging high-quality detection data and robust feature models.

By providing public access to these nuanced detection and re-identification features, the authors facilitate wider research and development efforts within the community, aiming to accelerate advancements in the efficacy and reliability of multiple object tracking systems. Future exploration in this domain may involve further refining feature extraction techniques, improving computational efficiency, and enhancing scalability across varied and complex environments.

Markdown Report Issue