Multiple Object Tracking: A Literature Review

Published 26 Sep 2014 in cs.CV | (1409.7618v5)

Abstract: Multiple Object Tracking (MOT) has gained increasing attention due to its academic and commercial potential. Although different approaches have been proposed to tackle this problem, it still remains challenging due to factors like abrupt appearance changes and severe object occlusions. In this work, we contribute the first comprehensive and most recent review on this problem. We inspect the recent advances in various aspects and propose some interesting directions for future research. To the best of our knowledge, there has not been any extensive review on this topic in the community. We endeavor to provide a thorough review on the development of this problem in recent decades. The main contributions of this review are fourfold: 1) Key aspects in an MOT system, including formulation, categorization, key principles, evaluation of MOT are discussed; 2) Instead of enumerating individual works, we discuss existing approaches according to various aspects, in each of which methods are divided into different groups and each group is discussed in detail for the principles, advances and drawbacks; 3) We examine experiments of existing publications and summarize results on popular datasets to provide quantitative and comprehensive comparisons. By analyzing the results from different perspectives, we have verified some basic agreements in the field; and 4) We provide a discussion about issues of MOT research, as well as some interesting directions which will become potential research effort in the future.

Abstract PDF Upgrade to Chat

Citations (730)

View on Semantic Scholar

Summary

The paper unifies fragmented research on multiple object tracking by formalizing it as a MAP estimation problem and categorizing approaches.
It details key MOT components such as appearance, motion, and interaction models while addressing occlusion handling and exclusion constraints.
It evaluates performance with standard metrics like MOTA and MOTP and outlines future directions including deep learning and scene understanding.

Multiple Object Tracking: A Comprehensive Literature Review

The paper "Multiple Object Tracking: A Literature Review" by Wenhan Luo et al. offers a thorough examination of the state-of-the-art in Multiple Object Tracking (MOT). MOT is a pivotal task in computer vision, with applications ranging from visual surveillance to autonomous driving. The paper aims to unify various fragmented research efforts in the field and provide a structured overview encompassing problem formulation, key components, evaluation metrics, and future research directions.

Problem Formulation and Categorization

The authors begin by formalizing the MOT problem within a probabilistic framework. They propose representing the states of objects as a distribution with inherent uncertainty and aim to estimate these posterior states given a sequence of observations. The general objective is framed as a Maximum A Posteriori (MAP) estimation problem:

$\widehat{\mathbf{S}_{1:t} = \underset{\mathbf{S}_{1:t}}\argmax \ P\left(\mathbf{S}_{1:t}|\mathbf{O}_{1:t}\right)$

This formulation allows for varying methodological approaches, either from a probabilistic inference perspective or a deterministic optimization perspective.

To provide a clearer understanding of the different methodologies within MOT, the paper categorizes existing approaches based on three criteria:

Initialization Method: Differentiates between Detection-Based Tracking (DBT) and Detection-Free Tracking (DFT).
Processing Mode: Distinguishes between online (sequential) and offline (batch) tracking methods.
Type of Output: Differentiates between deterministic and probabilistic outputs.

Key Components in MOT Systems

Appearance Model

Appearance models are crucial for affinity computation in MOT. These models encompass visual representation and statistical measures to quantify similarity between objects. Visual representation can include various features such as local features (KLT, optical flow), region features (color histogram, HOG, covariance matrix), and depth features. Statistical measures then use these representations to compute affinities between observations, often through strategies like boosting, concatenation, summation, product, and cascading.

Motion Model

Motion models predict the future positions of objects, reducing the search space and thereby enhancing tracking accuracy. The authors discuss both linear (e.g., constant velocity models) and non-linear motion models, which can handle more complex tracking scenarios.

Interaction Model

Interaction models capture the influence of objects on each other, particularly useful in crowded scenarios. Two primary types are social force models, which include individual and group forces, and crowd motion pattern models, which leverage learned motion patterns in high-density environments.

Exclusion Model

Exclusion models enforce the non-overlapping constraint of physical objects in space. These models are implemented at both detection-level (ensuring no two detections correspond to the same object) and trajectory-level (ensuring trajectories do not overlap excessively).

Occlusion Handling

Occlusion handling remains a significant challenge in MOT. Strategies include part-to-whole methods (tracking visible parts of occluded objects), hypothesize-and-test methods (generating and testing occlusion hypotheses), and buffer-and-recover methods (temporarily buffer occluded objects and recover their trajectories post-occlusion).

Inference

The inference process in MOT can be probabilistic, using models like the Kalman filter and particle filter, or deterministic, using optimization techniques like bipartite matching and network flow.

Evaluation of MOT Systems

Metrics

Evaluation metrics for MOT include detection accuracy (Recall, Precision, FAF, MODA) and tracking accuracy (MOTA, MOTP, IDS). These metrics facilitate a quantitative comparison between different MOT approaches.

Datasets and Public Algorithms

Public datasets (e.g., KITTI, PETS, MOT16) provide a standardized benchmark for evaluating MOT algorithms. The paper also lists various publicly available algorithms, promoting transparency and reproducibility in research.

Implications and Future Directions

The review highlights several existing issues in current MOT research, such as dependency on object detectors and the challenges of parameter tuning and generalization across different datasets. To address these issues, potential future research directions include:

Video Adaptation: Adapting object detectors to specific video contexts.
Multi-Camera and 3D MOT: Leveraging multiple camera setups or 3D models for improved tracking performance.
Scene Understanding: Integrating contextual information and scene understanding into tracking algorithms.
Deep Learning: Harnessing the power of deep learning for object detection and trajectory estimation.

By systematically summarizing the state of research in MOT, this paper serves as a valuable resource for both new and seasoned researchers, guiding future work towards addressing the open challenges and advancing the field.