Features for Multi-Target Multi-Camera Tracking and Re-Identification

Published 28 Mar 2018 in cs.CV | (1803.10859v1)

Abstract: Multi-Target Multi-Camera Tracking (MTMCT) tracks many people through video taken from several cameras. Person Re-Identification (Re-ID) retrieves from a gallery images of people similar to a person query image. We learn good features for both MTMCT and Re-ID with a convolutional neural network. Our contributions include an adaptive weighted triplet loss for training and a new technique for hard-identity mining. Our method outperforms the state of the art both on the DukeMTMC benchmarks for tracking, and on the Market-1501 and DukeMTMC-ReID benchmarks for Re-ID. We examine the correlation between good Re-ID and good MTMCT scores, and perform ablation studies to elucidate the contributions of the main components of our system. Code is available.

Abstract PDF Upgrade to Chat

Citations (502)

View on Semantic Scholar

Summary

The paper introduces an adaptive weighted triplet loss and a novel hard-identity mining approach for effective feature learning in multi-camera tracking and re-identification.
It demonstrates state-of-the-art performance on DukeMTMC and Market-1501 benchmarks, showcasing improved IDF1 and identity recall metrics.
The study highlights that enhanced re-identification accuracy yields diminishing returns for tracking in moderately crowded scenes.

Features for Multi-Target Multi-Camera Tracking and Re-Identification

The paper introduces a comprehensive method for Multi-Target Multi-Camera Tracking (MTMCT) and Person Re-Identification (Re-ID) using convolutional neural networks. The primary contributions include an adaptive weighted triplet loss for training, a novel approach for hard-identity mining, and state-of-the-art results on DukeMTMC benchmarks for tracking and Market-1501 and DukeMTMC-ReID benchmarks for re-identification.

Figure 1: Two example multi-camera results from the tracker on the DukeMTMC dataset.

Introduction

MTMCT aims at tracking multiple people across various camera feeds, making it essential for applications in surveillance and monitoring crowded environments like airports and shopping centers. Given the non-overlapping fields of view of cameras and varying illumination and viewpoints, automating MTMCT presents significant challenges. Additionally, Re-ID facilitates the identification of individuals across different cameras, where ranking performance is crucial.

The paper discusses the nuanced differences between MTMCT and Re-ID, emphasizing that despite apparent similarities, they require distinct approaches due to differences in metrics—classification versus ranking performance. Hence, MTMCT and Re-ID, although interconnected, necessitate tailored loss functions for effective implementation.

The authors leverage a coupled approach using a triplet loss function typical of Re-ID, supplemented by hard-data mining methods, to develop high-performance features applicable to both MTMCT and Re-ID. Notably, the study reveals that enhanced Re-ID accuracy does not always translate linearly to improved MTMCT performance in moderately crowded scenes.

Methodology

The proposed methodology begins with the extraction of video streams, from which bounding box observations are detected using a state-of-the-art person detector.

Figure 2: The pipeline for Multi-Target Multi-Camera Tracking illustrating the processing steps from detection to multi-camera trajectory.

Learning Appearance Features

The learning of appearance features employs an adaptive weighted triplet loss, designed to emphasize difficult samples. This approach circumvents the combinatorial complexity of triplet generation and addresses class imbalance through weight distribution.

The triplet loss ( $L_3$ ) is defined such that:

$L_{3} = \left[ m + \sum_{x_p \in P(a)} w_p d(x_a,x_p) - \sum_{x_n \in N(a)} w_n d(x_a,x_n) \right]_+$

where $m$ is the margin, and $d$ denotes the Euclidean distance. Adaptive weights are assigned using softmax/min weight distributions to facilitate robust training, even in the presence of outliers.

A hard identity mining scheme is introduced, allowing for more frequent sampling of challenging identities during training, further enhancing feature learning efficiency.

Figure 3: Triplet loss weighing schemes with emphasis on adaptive weights for stability and accuracy.

MTMC Tracker

The method involves correlation clustering, with matrices collecting appearance and motion correlations discounted over time to optimize identity labeling. The objective is to maximize positive correlation within clusters, adhering to transitivity constraints.

Multi-level processing is implemented, starting from one-second tracklets up to multi-camera trajectories. The hierarchical approach reduces computational complexity and ensures accuracy across temporal windows.

Figure 4: Relation between validation and test sets indicative of feature learning stability and performance consistency.

Experiments and Results

Experiments conducted on DukeMTMC and Market-1501 datasets demonstrate superior MTMC tracking by achieving higher IDF1 and identity recall metrics compared to existing methods. The adaptive weighted triplet loss and hard-identity mining prove instrumental in enhancing the effectiveness of learned features.

Figure 5: Relation of tracking, correlation, and rank accuracy highlighting diminishing returns in MTMCT beyond a certain rank accuracy threshold.

Re-ID Performance

The analysis reveals a relationship between MTMCT tracking accuracy and Re-ID rank accuracy: improvements in Re-ID yield diminishing returns in MTMCT after a certain point. This saturation effect underscores that once correlations are mostly accurate, further accuracy in ranking has a limited impact on MTMCT.

Conclusion

The paper successfully presents a method leveraging adaptive weighted triplet loss and hard-identity mining, achieving significant advances in both MTMCT and Re-ID tasks. Future exploration of larger datasets might further validate the robustness of the proposed solutions and facilitate advancements in video surveillance and Re-ID technologies.