Multi-Object Tracking as Attention Mechanism
Abstract: We propose a conceptually simple and thus fast multi-object tracking (MOT) model that does not require any attached modules, such as the Kalman filter, Hungarian algorithm, transformer blocks, or graph networks. Conventional MOT models are built upon the multi-step modules listed above, and thus the computational cost is high. Our proposed end-to-end MOT model, \textit{TicrossNet}, is composed of a base detector and a cross-attention module only. As a result, the overhead of tracking does not increase significantly even when the number of instances ($N_t$) increases. We show that TicrossNet runs \textit{in real-time}; specifically, it achieves 32.6 FPS on MOT17 and 31.0 FPS on MOT20 (Tesla V100), which includes as many as $>$100 instances per frame. We also demonstrate that TicrossNet is robust to $N_t$; thus, it does not have to change the size of the base detector, depending on $N_t$, as is often done by other models for real-time processing.
- Zhang Yifu et al., “FairMOT: On the fairness of detection and re-identification in multiple object tracking,” in International Journal of Computer Vision, 2021.
- “Tracking objects as points,” in Proceedings of the IEEE Conference on European Conference on Computer Vision, 2020.
- Zhang Yifu et al., “ByteTrack: Multi-object tracking by associating every detection box,” in Proceedings of the IEEE Conference on European Conference on Computer Vision, 2022.
- “MOTR: End-to-end multiple-object tracking with transformer,” in Proceedings of the IEEE Conference on European Conference on Computer Vision, 2022.
- Sun Peize et al., “Transtrack: Multiple-object tracking with transformer,” in arXiv preprint arXiv:2012.15460, 2020.
- “Simple online and realtime tracking with a deep association metric,” in IEEE International Conference on Image Processing, 2017.
- “Deep affinity network for multiple object tracking,” in IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018.
- “TransCenter: Transformers with dense representations for multiple-object tracking,” in arXiv preprint arXiv:2103.15145, 2021.
- Ashish Vaswani et al., “Attention is all you need,” in Advances in Neural Information Processing Systems, 2017, vol. 30, pp. 5998–6008.
- “MOT16: A benchmark for multi-object tracking,” in arXiv preprint arXiv:1603.00831, 2016.
- Dendorfer Patrick et al., “MOT20: A benchmark for multi object tracking in crowded scenes,” in arXiv preprint arXiv:2003.09003, 2020.
- “Objects as points,” in arXiv preprint arXiv:1901.06129, 2019.
- Li Peixuan and Jin Jieyu, “Time3D: End-to-end joint monocular 3D object detection and tracking for autonomous driving,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2022.
- Pauli Virtanen et al., “SciPy 1.0: Fundamental algorithms for scientific computing in python,” Nature Methods, vol. 17, pp. 261–272, 2020.
- Peng Jinlong et al., “Chained-Tracker: Chaining paired attentive regression results for end-to-end joint multiple-object detection and tracking,” in Proceedings of the IEEE Conference on European Conference on Computer Vision, 2020.
- “Pedestrian detection: An evaluation of the state of the art,” in IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012.
- “CityPersons: A diverse dataset for pedestrian detection,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017.
- “Robust multiperson tracking from a mobile platform,” in IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009.
- “End-to-end deep learning for person search,” in arXiv preprint arXiv:1604.01850, 2016.
- “Person re-identification in the wild,” in arXiv preprint arXiv:1604.02531, 2016.
- “Crowdhuman: A benchmark for detecting human in a crowd,” in arXiv preprint arXiv:1805.00123, 2018.
- “Evaluating multiple objec tracking performance: the CLEAR MOT metrics,” in Image Video Process, 2008.
- “YOLOX: Exceeding yolo series in 2021,” arXiv preprint arXiv:2107.08430, 2021.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.