Papers
Topics
Authors
Recent
Search
2000 character limit reached

Multi-Object Tracking as Attention Mechanism

Published 12 Jul 2023 in cs.CV | (2307.05874v1)

Abstract: We propose a conceptually simple and thus fast multi-object tracking (MOT) model that does not require any attached modules, such as the Kalman filter, Hungarian algorithm, transformer blocks, or graph networks. Conventional MOT models are built upon the multi-step modules listed above, and thus the computational cost is high. Our proposed end-to-end MOT model, \textit{TicrossNet}, is composed of a base detector and a cross-attention module only. As a result, the overhead of tracking does not increase significantly even when the number of instances ($N_t$) increases. We show that TicrossNet runs \textit{in real-time}; specifically, it achieves 32.6 FPS on MOT17 and 31.0 FPS on MOT20 (Tesla V100), which includes as many as $>$100 instances per frame. We also demonstrate that TicrossNet is robust to $N_t$; thus, it does not have to change the size of the base detector, depending on $N_t$, as is often done by other models for real-time processing.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (23)
  1. Zhang Yifu et al., “FairMOT: On the fairness of detection and re-identification in multiple object tracking,” in International Journal of Computer Vision, 2021.
  2. “Tracking objects as points,” in Proceedings of the IEEE Conference on European Conference on Computer Vision, 2020.
  3. Zhang Yifu et al., “ByteTrack: Multi-object tracking by associating every detection box,” in Proceedings of the IEEE Conference on European Conference on Computer Vision, 2022.
  4. “MOTR: End-to-end multiple-object tracking with transformer,” in Proceedings of the IEEE Conference on European Conference on Computer Vision, 2022.
  5. Sun Peize et al., “Transtrack: Multiple-object tracking with transformer,” in arXiv preprint arXiv:2012.15460, 2020.
  6. “Simple online and realtime tracking with a deep association metric,” in IEEE International Conference on Image Processing, 2017.
  7. “Deep affinity network for multiple object tracking,” in IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018.
  8. “TransCenter: Transformers with dense representations for multiple-object tracking,” in arXiv preprint arXiv:2103.15145, 2021.
  9. Ashish Vaswani et al., “Attention is all you need,” in Advances in Neural Information Processing Systems, 2017, vol. 30, pp. 5998–6008.
  10. “MOT16: A benchmark for multi-object tracking,” in arXiv preprint arXiv:1603.00831, 2016.
  11. Dendorfer Patrick et al., “MOT20: A benchmark for multi object tracking in crowded scenes,” in arXiv preprint arXiv:2003.09003, 2020.
  12. “Objects as points,” in arXiv preprint arXiv:1901.06129, 2019.
  13. Li Peixuan and Jin Jieyu, “Time3D: End-to-end joint monocular 3D object detection and tracking for autonomous driving,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2022.
  14. Pauli Virtanen et al., “SciPy 1.0: Fundamental algorithms for scientific computing in python,” Nature Methods, vol. 17, pp. 261–272, 2020.
  15. Peng Jinlong et al., “Chained-Tracker: Chaining paired attentive regression results for end-to-end joint multiple-object detection and tracking,” in Proceedings of the IEEE Conference on European Conference on Computer Vision, 2020.
  16. “Pedestrian detection: An evaluation of the state of the art,” in IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012.
  17. “CityPersons: A diverse dataset for pedestrian detection,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017.
  18. “Robust multiperson tracking from a mobile platform,” in IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009.
  19. “End-to-end deep learning for person search,” in arXiv preprint arXiv:1604.01850, 2016.
  20. “Person re-identification in the wild,” in arXiv preprint arXiv:1604.02531, 2016.
  21. “Crowdhuman: A benchmark for detecting human in a crowd,” in arXiv preprint arXiv:1805.00123, 2018.
  22. “Evaluating multiple objec tracking performance: the CLEAR MOT metrics,” in Image Video Process, 2008.
  23. “YOLOX: Exceeding yolo series in 2021,” arXiv preprint arXiv:2107.08430, 2021.
Citations (3)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 30 likes about this paper.