Papers
Topics
Authors
Recent
Search
2000 character limit reached

JIST: Joint Image and Sequence Training for Sequential Visual Place Recognition

Published 28 Mar 2024 in cs.CV | (2403.19787v1)

Abstract: Visual Place Recognition aims at recognizing previously visited places by relying on visual clues, and it is used in robotics applications for SLAM and localization. Since typically a mobile robot has access to a continuous stream of frames, this task is naturally cast as a sequence-to-sequence localization problem. Nevertheless, obtaining sequences of labelled data is much more expensive than collecting isolated images, which can be done in an automated way with little supervision. As a mitigation to this problem, we propose a novel Joint Image and Sequence Training protocol (JIST) that leverages large uncurated sets of images through a multi-task learning framework. With JIST we also introduce SeqGeM, an aggregation layer that revisits the popular GeM pooling to produce a single robust and compact embedding from a sequence of single-frame embeddings. We show that our model is able to outperform previous state of the art while being faster, using 8 times smaller descriptors, having a lighter architecture and allowing to process sequences of various lengths. Code is available at https://github.com/ga1i13o/JIST

Definition Search Book Streamline Icon: https://streamlinehq.com
References (39)
  1. C. Masone and B. Caputo, “A survey on deep visual place recognition,” IEEE Access, vol. 9, pp. 19 516–19 547, 2021.
  2. K. A. Tsintotas, L. Bampis, and A. Gasteratos, “The revisiting problem in simultaneous localization and mapping: A survey on visual loop closure detection,” IEEE Trans. on Pattern Anal. and Mach. Intell., 2022.
  3. A. Angeli, S. Doncieux, J.-A. Meyer, and D. Filliat, “Real-time visual loop-closure detection,” in IEEE Int. Conf. on Robot. and Autom., 2008, pp. 1842–1847.
  4. N. Pion, M. Humenberger, G. Csurka, Y. Cabon, and T. Sattler, “Benchmarking image retrieval for visual localization,” in 2020 International Conference on 3D Vision (3DV), 2020, pp. 483–494.
  5. D. Yudin, Y. Solomentsev, R. Musaev, A. Staroverov, and A. I. Panov, “Hpointloc: Point-based indoor place recognition using synthetic rgb-d images,” in Neural Information Processing, 2023, pp. 471–484.
  6. F. Zeng, A. Jacobson, D. W. Smith, N. Boswell, T. Peynot, and M. Milford, “Enhancing underground visual place recognition with shannon entropy saliency,” in IEEE Int. Conf. on Robot. and Autom., 2017.
  7. C. Toft, W. Maddern, A. Torii, L. Hammarstrand, E. Stenborg, D. Safari, M. Okutomi, M. Pollefeys, J. Sivic, T. Pajdla, F. Kahl, and T. Sattler, “Long-term visual localization revisited,” IEEE Trans. on Pattern Anal. and Mach. Intell., 2020.
  8. F. Warburg, S. Hauberg, M. Lopez-Antequera, P. Gargallo, Y. Kuang, and J. Civera, “Mapillary street-level sequences: A dataset for lifelong place recognition,” in Conf. on Comput. Vis. and Pattern Recog., 2020, pp. 2623–2632.
  9. J. M. Fácil, D. Olid, L. Montesano, and J. Civera, “Condition-invariant multi-view place recognition,” ArXiv, vol. abs/1902.09516, 2019.
  10. S. Garg, B. Harwood, G. Anand, and M. Milford, “Delta descriptors: Change-based place representation for robust visual localization,” IEEE Robot. and Autom. Letters, vol. 5, no. 4, pp. 5120–5127, 2020.
  11. S. Garg and M. Milford, “SeqNet: Learning descriptors for sequence-based hierarchical place recognition,” IEEE Robot. and Autom. Letters, 2021.
  12. R. Mereu, G. Trivigno, G. Berton, C. Masone, and B. Caputo, “Learning sequential descriptors for sequence-based visual place recognition,” IEEE Robot. and Autom. Letters, vol. 7, no. 4, pp. 10 383–10 390, 2022.
  13. G. Berton, C. Masone, and B. Caputo, “Rethinking visual geo-localization for large-scale applications,” in Conf. on Comput. Vis. and Pattern Recog., 2022, pp. 4868–4878.
  14. A. Ali-bey, B. Chaib-draa, and P. Giguère, “Mixvpr: Feature mixing for visual place recognition,” in IEEE Wint. Conf. on Appl. of Comput. Vis., 2023, pp. 2998–3007.
  15. G. Berton, R. Mereu, G. Trivigno, C. Masone, G. Csurka, T. Sattler, and B. Caputo, “Deep visual geo-localization benchmark,” in Conf. on Comput. Vis. and Pattern Recog., 2022, pp. 5386–5397.
  16. J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman, “Object retrieval with large vocabularies and fast spatial matching.” in Conf. on Comput. Vis. and Pattern Recog., 2007, pp. 1–8.
  17. J. Martinez, S. Doubov, J. Fan, l. A. Bârsan, S. Wang, G. Máttyus, and R. Urtasun, “Pit30m: A benchmark for global localization in the age of self-driving cars,” in 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2020, pp. 4477–4484.
  18. F. Radenović, G. Tolias, and O. Chum, “Fine-tuning CNN Image Retrieval with No Human Annotation,” IEEE Trans. on Pattern Anal. and Mach. Intell., 2018.
  19. K. L. Ho and P. Newman, “Detecting loop closure with scene sequences,” Int. J. Comput. Vis., vol. 74, no. 3, pp. 261–286, 2007.
  20. M. J. Milford and G. F. Wyeth, “SeqSLAM: Visual route-based navigation for sunny summer days and stormy winter nights,” in IEEE Int. Conf. on Robot. and Autom., 2012, pp. 1643–1649.
  21. S. Schubert and P. Neubert, “What makes visual place recognition easy or hard?” ArXiv, vol. abs/2106.12671, 2021.
  22. T. Naseer, W. Burgard, and C. Stachniss, “Robust visual localization across seasons,” IEEE Transactions on Robotics, vol. 34, no. 2, pp. 289–302, 2018.
  23. O. Vysotska and C. Stachniss, “Lazy data association for image sequences matching under substantial appearance changes,” IEEE Robot. and Autom. Letters, vol. 1, no. 1, pp. 213–220, 2016.
  24. S. Schubert, P. Neubert, and P. Protzel, “Graph-based non-linear least squares optimization for visual place recognition in changing environments,” IEEE Robotics and Automation Letters, vol. 6, no. 2, pp. 811–818, 2021.
  25. S. Schubert, P. Neubert, and Protzel, “Fast and memory efficient graph optimization via icm for visual place recognition.” in Robotics: Science and Systems, vol. 73, 2021, pp. 1842–1847.
  26. S. Garg, M. Vankadari, and M. Milford, “SeqMatchNet: Contrastive learning with sequence matching for place recognition & relocalization,” in CoRL, ser. Proc. Mach. Learn. Res., vol. 164.   PMLR, 2022, pp. 429–443.
  27. P. Neubert and S. Schubert, “Hyperdimensional computing as a framework for systematic aggregation of image descriptors,” in 2021 Conf. on Comput. Vis. and Pattern Recog., 2021, pp. 16 933–16 942.
  28. F. Zhang, J. Zhao, Y. Cai, G. Tian, W. Mu, and C. Ye, “Learning sequence descriptor based on spatiotemporal attention for visual place recognition,” arXiv preprint arXiv:2305.11467, July 2023.
  29. R. Arandjelović, P. Gronat, A. Torii, T. Pajdla, and J. Sivic, “NetVLAD: CNN architecture for weakly supervised place recognition,” IEEE Trans. on Pattern Anal. and Mach. Intell., vol. 40, no. 6, pp. 1437–1451, 2018.
  30. A. Ali-bey, B. Chaib-draa, and P. Giguère, “Gsv-cities: Toward appropriate supervised visual place recognition,” Neurocomputing, 2022.
  31. M. Leyva-Vallina, N. Strisciuglio, and N. Petkov, “Data-efficient large scale place recognition with graded similarity supervision,” Conf. on Comput. Vis. and Pattern Recog., pp. 23 487–23 496, 2023.
  32. H. Wang, Y. Wang, Z. Zhou, X. Ji, D. Gong, J. Zhou, Z. Li, and W. Liu, “Cosface: Large margin cosine loss for deep face recognition,” in Conf. on Comput. Vis. and Pattern Recog., 2018, pp. 5265–5274.
  33. A. Hassani, S. Walton, N. Shah, A. Abuduweili, J. Li, and H. Shi, “Escaping the big data paradigm with compact transformers,” CoRR, vol. abs/2104.05704, 2021.
  34. G. Bertasius, H. Wang, and L. Torresani, “Is space-time attention all you need for video understanding?” in Int. Conf. Mach. Learn., 2021, pp. 813–824.
  35. W. Maddern, G. Pascoe, C. Linegar, and P. Newman, “1 Year, 1000km: The Oxford RobotCar Dataset,” The Int. J. of Robot. Research, 2017.
  36. D. Kingma and J. Ba, “Adam: A method for stochastic optimization,” Int. Conf. Learn. Represent., 2014.
  37. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Conf. on Comput. Vis. and Pattern Recog., 2016, pp. 770–778.
  38. H. Jégou, M. Douze, and C. Schmid, “Product quantization for nearest neighbor search.” IEEE Trans. on Pattern Anal. and Mach. Intell., vol. 33, no. 1, pp. 117–128, 2011.
  39. A. Babenko and V. S. Lempitsky, “The inverted multi-index.” in Conf. on Comput. Vis. and Pattern Recog.   IEEE Computer Society, 2012, pp. 3069–3076.
Citations (1)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.