SPAMming Labels: Efficient Annotations for the Trackers of Tomorrow
Abstract: Increasing the annotation efficiency of trajectory annotations from videos has the potential to enable the next generation of data-hungry tracking algorithms to thrive on large-scale datasets. Despite the importance of this task, there are currently very few works exploring how to efficiently label tracking datasets comprehensively. In this work, we introduce SPAM, a video label engine that provides high-quality labels with minimal human intervention. SPAM is built around two key insights: i) most tracking scenarios can be easily resolved. To take advantage of this, we utilize a pre-trained model to generate high-quality pseudo-labels, reserving human involvement for a smaller subset of more difficult instances; ii) handling the spatiotemporal dependencies of track annotations across time can be elegantly and efficiently formulated through graphs. Therefore, we use a unified graph formulation to address the annotation of both detections and identity association for tracks across time. Based on these insights, SPAM produces high-quality annotations with a fraction of ground truth labeling cost. We demonstrate that trackers trained on SPAM labels achieve comparable performance to those trained on human annotations while requiring only $3-20\%$ of the human labeling effort. Hence, SPAM paves the way towards highly efficient labeling of large-scale tracking datasets. We release all models and code.
- Efficient interactive annotation of segmentation datasets with polygon-rnn++. In CVPR, 2018.
- Social lstm: Human trajectory prediction in crowded spaces. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016.
- A. Andriyenko and K. Schindler. Multi-target tracking by continuous energy minimization. In CVPR, pages 1265–1272, 2011.
- Self-supervised multi-object tracking with cross-input consistency. Advances in Neural Information Processing Systems, 34:13695–13706, 2021.
- The power of ensembles for active learning in image classification. In CVPR, pages 9368–9377, 2018.
- Multiple object tracking using k-shortest paths optimization. IEEE TPAMI, 33(9):1806–1819, 2011.
- Tracking without bells and whistles. In ICCV, pages 941–951, 2019.
- Multi-object tracking and segmentation via neural message passing. IJCV, 130(12):3035–3053, 2022.
- G. Braso and L. Leal-Taixe. Learning a neural solver for multiple object tracking. In CVPR, 2020.
- Memot: Multi-object tracking with memory. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8090–8100, 2022.
- Unsupervised learning of visual features by contrasting cluster assignments. NeurIPS, 2020.
- Unifying short and long-term tracking with graph hierarchies. In CVPR, pages 22877–22887, June 2023.
- Real-time multiple people tracking with deeply learned candidate selection and person re-identification. In 2018 IEEE international conference on multimedia and expo (ICME), pages 1–6. IEEE, 2018.
- A simple framework for contrastive learning of visual representations. In IEEE Int. Conf. Mach. Learn., pages 1597–1607. PMLR, 2020.
- P. Chu and H. Ling. Famnet: Joint learning of feature, affinity and multi-dimensional assignment for online multiple object tracking. In ICCV, October 2019.
- Video annotation for visual tracking via selection and refinement. In ICCV, 2021.
- Up-detr: Unsupervised pre-training for object detection with transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 1601–1610, June 2021.
- Motchallenge: A benchmark for single-camera multiple target tracking. International Journal of Computer Vision, 129(4):845–881, 2021.
- Mot20: A benchmark for multi object tracking in crowded scenes. ArXiv, abs/2003.09003, 2020.
- Pedestrian detection: A benchmark. In CVPR, pages 304–311. IEEE, 2009.
- Not all labels are equal: Rationalizing the labeling costs for training object detection. In CVPR, pages 14492–14501, 2022.
- A mobile vision system for robust multi-person tracking. In CVPR, pages 1–8, 2008.
- Motsynth: How can synthetic data help pedestrian detection and tracking? In ICCV, pages 10849–10859, October 2021.
- Learning to detect and track visible and occluded body joints in a virtual world. In ECCV, 2018.
- Detect to track and track to detect. In ICCV, Oct 2017.
- Virtual worlds as proxy for multi-object tracking analysis. In CVPR, pages 4340–4349, 2016.
- Yolox: Exceeding yolo series in 2021. arXiv preprint arXiv:2107.08430, 2021.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
- Joint monocular 3d vehicle detection and tracking. In ICCV, 2019.
- Space-time correspondence as a contrastive random walk. Advances in neural information processing systems, 33:19545–19560, 2020.
- A linear programming approach for multiple object tracking. In CVPR, pages 1–8, 2007.
- Framework for performance evaluation for face, text and vehicle detection and tracking in video: data, metrics, and protocol. IEEE TPAMI, 2009.
- Label, verify, correct: A simple few-shot object detection method. In IEEE Conference on Computer Vision and Pattern Recognition, 2022.
- D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- Segment anything. In ICCV, 2023.
- Vision transformers are good mask auto-labelers. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2023.
- Learning by tracking: Siamese cnn for robust target association. In CVPRW, June 2016.
- Learning an image-based motion context for multiple people tracking. In CVPR, June 2014.
- Everybody needs somebody: Modeling social and grouping behavior on a linear programming multiple people tracker. In Int. Conf. Comput. Vis. Worksh., pages 120–127, 2011.
- Heterogeneous diversity driven active learning for multi-object tracking. In ICCV, pages 9932–9941, 2023.
- Heterogeneous diversity driven active learning for multi-object tracking. In ICCV, 2023.
- Guiding pseudo-labels with uncertainty estimation for source-free unsupervised domain adaptation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
- Uncertainty-aware unsupervised multi-object tracking. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 9996–10005, 2023.
- Hota: A higher order metric for evaluating multi-object tracking. IJCV, 129(2):548–578, 2021.
- Pathtrack: Fast trajectory annotation with path supervision. In Proceedings of the IEEE International Conference on Computer Vision, pages 290–299, 2017.
- Trackformer: Multi-object tracking with transformers. In IEEE Conf. Comput. Vis. Pattern Recog., 2022.
- Tracking without label: Unsupervised multiple object tracking via contrastive similarity learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 16264–16273, 2023.
- Mot16: A benchmark for multi-object tracking. arXiv preprint arXiv:1603.00831, 2016.
- Quasi-dense similarity learning for multiple object tracking. In CVPR, pages 164–173, 2021.
- You’ll never walk alone: Modeling social behavior for multi-target tracking. In ICCV, pages 261–268, 2009.
- Globally-optimal greedy algorithms for tracking a variable number of objects. In CVPR, pages 1201–1208, 2011.
- Motiontrack: Learning robust short-term and long-term motions for multi-object tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17939–17948, 2023.
- Performance measures and a data set for multi-target, multi-camera tracking. In Eur. Conf. Comput. Vis. Worksh., pages 17–35. Springer, 2016.
- E. Ristani and C. Tomasi. Features for multi-target multi-camera tracking and re-identification. In CVPR, June 2018.
- Learning social etiquette: Human trajectory understanding in crowded scenes. In ECCV, 2016.
- P. Scovanner and M. F. Tappen. Learning pedestrian dynamics from the real world. In 2009 IEEE 12th International Conference on Computer Vision, pages 381–388. IEEE, 2009.
- Simple cues lead to a strong multi-object tracker. In CVPR, pages 13813–13823, 2023.
- O. Sener and S. Savarese. Active learning for convolutional neural networks: A core-set approach. In ICLR, 2018.
- Crowdhuman: A benchmark for detecting human in a crowd. arXiv preprint arXiv:1805.00123, 2018.
- Dancetrack: Multi-object tracking in uniform appearance and diverse motion. In CVPR, 2022.
- Simultaneous detection and tracking with motion modelling for multiple object tracking. In ECCV, pages 626–643, 2020.
- Mots: Multi-object tracking and segmentation. In Proceedings of the ieee/cvf conference on computer vision and pattern recognition, pages 7942–7951, 2019.
- Reducing the annotation effort for video object segmentation datasets. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 3060–3069, 2021.
- C. Vondrick and D. Ramanan. Video annotation and tracking with active learning. NeurIPS, 24, 2011.
- Efficiently scaling up video annotation with crowdsourced marketplaces. In Computer Vision–ECCV 2010: 11th European Conference on Computer Vision, Heraklion, Crete, Greece, September 5-11, 2010, Proceedings, Part IV 11, pages 610–623. Springer, 2010.
- Tracking emerges by colorizing videos. In Proceedings of the European conference on computer vision (ECCV), pages 391–408, 2018.
- Learning correspondence from the cycle-consistency of time. In CVPR, 2019.
- Joint object detection and multi-object tracking with graph neural networks. In IEEE Int. Conf. Robotics and Autom., pages 13708–13715, 2021.
- Towards real-time multi-object tracking. The European Conference on Computer Vision (ECCV), 2020.
- Aligning pretraining for detection via object-level contrastive learning. In Advances in Neural Information Processing Systems, 2021.
- Joint detection and identification feature learning for person search. In CVPR, pages 3415–3424, 2017.
- Who are you with and where are you going? In CVPR, pages 1345–1352, 2011.
- Utm: A unified multiple object tracking model with identity-aware feature enhancement. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21876–21886, 2023.
- Active learning for deep visual tracking. IEEE Transactions on Neural Networks and Learning Systems, 2023.
- Motr: End-to-end multiple-object tracking with transformer. In European Conference on Computer Vision, pages 659–675. Springer, 2022.
- Global data association for multi-object tracking using network flows. In CVPR, 2008.
- Citypersons: A diverse dataset for pedestrian detection. In CVPR, pages 3213–3221, 2017.
- Bytetrack: Multi-object tracking by associating every detection box. In ECCV, pages 1–21. Springer, 2022.
- Fairmot: On the fairness of detection and re-identification in multiple object tracking. IJCV, 129(11):3069–3087, 2021.
- Person re-identification in the wild. In CVPR, pages 1367–1376, 2017.
- Tracking objects as points. In ECCV, pages 474–490. Springer, 2020.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.