Performance Measures and a Data Set for Multi-Target, Multi-Camera Tracking

Published 6 Sep 2016 in cs.CV | (1609.01775v2)

Abstract: To help accelerate progress in multi-target, multi-camera tracking systems, we present (i) a new pair of precision-recall measures of performance that treats errors of all types uniformly and emphasizes correct identification over sources of error; (ii) the largest fully-annotated and calibrated data set to date with more than 2 million frames of 1080p, 60fps video taken by 8 cameras observing more than 2,700 identities over 85 minutes; and (iii) a reference software system as a comparison baseline. We show that (i) our measures properly account for bottom-line identity match performance in the multi-camera setting; (ii) our data set poses realistic challenges to current trackers; and (iii) the performance of our system is comparable to the state of the art.

Abstract PDF Upgrade to Chat

Citations (2,467)

View on Semantic Scholar

Summary

The paper introduces rigorous performance measures that link coverage metrics directly to precision and recall in tracking systems.
It establishes a formal framework and notation for minimum-cost bipartite matching in detection and tracking applications.
The derived metrics offer robust benchmarking tools for optimizing tracking performance in fields like autonomous driving and surveillance.

Analysis of Minimum-Cost Bipartite Matching in Detection and Tracking Frameworks

This paper presents a rigorous treatment of the minimum-cost bipartite matching problem within the context of detection and tracking. The principal objective is to identify a one-to-one matching that minimizes cumulative false positive and false negative errors, thereby optimizing the overall cost associated with mis-assigned frames. This work focuses on formalizing the notation and deriving proofs that link coverage metrics directly to standard performance metrics, such as precision and recall.

Notation and Definitions

At the core of the problem is the determination of matches within sets of ground truth trajectories and computed trajectories. This problem is framed in terms of specific notations: True Positive ID ( $TPID$ ), False Positive ID ( $FPID$ ), False Negative ID ( $FNID$ ), and True Negative ID ( $TNID$ ). The matches $(\tau, \gamma)$ in $TPID$ define a bijective mapping between ground-truth trajectories (MT) and computed trajectories (MC), signifying a precise truth-to-result correspondence.

The definitions extend to coverage metrics for both ground truth ( $\tau$ ) and computed trajectories ( $\gamma$ ), parameterized by a resolution $\Delta$ . For any given trajectory, the coverage represents the number of detected frames accurately matched minus any missed detections, thereby reflecting the performance fidelity of the tracking system.

Deriving Coverage Metrics

Coverage for a ground-truth trajectory $\tau$ and a computed trajectory $\gamma$ are defined as:

$\text{cov}(\tau, \Delta) = \text{len}(\tau) - \sum\limits_{t \in \mathcal{T}_{\tau}} m(\tau, \gamma_{m}(\tau), t, \Delta)$
$\text{cov}(\gamma, \Delta) = \text{len}(\gamma) - \sum\limits_{t \in \mathcal{T}_{\gamma}} m(\tau_{m}(\gamma), \gamma, t, \Delta)$

Here, $\text{len}(\cdot)$ represents the number of detections in the respective trajectory, and the summation terms denote the missed detections (false negatives in $\tau$ and false positives in $\gamma$ ).

Ground-truth coverage $\text{cov}_T(\Delta)$ and tracker output coverage $\text{cov}_C(\Delta)$ normalized over the sets of all ground truth trajectories (AT) and all computed trajectories (AC) yield values between 0 and 1, indicative of recall and precision respectively.

Proof and Theoretical Validation

The authors present an exhaustive case-based proof to assert the correctness of these coverage metrics, showing how coverage values ( $\text{cov}_T(\Delta)$ and $\text{cov}_C(\Delta)$ ) precisely map to recall (R) and precision (P).

Recall (R) is validated as:
- $R = \frac{TPID}{TPID + FNID} = \frac{\sum_{\tau \in MT} \text{cov}(\tau, \Delta)}{\sum_{\tau \in AT} \text{len}(\tau)} = \text{cov}_T(\Delta)$
Precision (P) is validated as:
- $P = \frac{TPID}{TPID + FPID} = \frac{\sum_{\gamma \in MC} \text{cov}(\gamma, \Delta)}{\sum_{\gamma \in AC} \text{len}(\gamma)} = \text{cov}_C(\Delta)$

Through these detailed proofs, the paper ensures the reliability of its proposed metrics for evaluating detection and tracking systems.

Implications and Future Directions

The implications of this work are significant both in practical and theoretical realms. Practically, this rigorous formulation allows for more precise benchmarking of tracking algorithms, with direct implications for fields such as autonomous driving, surveillance, and motion analysis. The theoretically sound basis for linking coverage metrics to established performance metrics like recall and precision simplifies the interpretation of tracking accuracy.

Future research might focus on extending these concepts to more complex scenarios, such as multi-object tracking, where interactions between multiple entities may introduce additional complexities in matching and coverage calculations. Moreover, exploring the impact of different resolution parameters ( $\Delta$ ) on the accuracy and robustness of coverage metrics could provide deeper insights into optimizing tracking algorithms.

In summary, this paper contributes to the structured understanding of matching problems in detection and tracking, equipping researchers with precise tools for evaluation and optimization of tracking systems.

Markdown Report Issue