JIST: Joint Image and Sequence Training for Sequential Visual Place Recognition
Abstract: Visual Place Recognition aims at recognizing previously visited places by relying on visual clues, and it is used in robotics applications for SLAM and localization. Since typically a mobile robot has access to a continuous stream of frames, this task is naturally cast as a sequence-to-sequence localization problem. Nevertheless, obtaining sequences of labelled data is much more expensive than collecting isolated images, which can be done in an automated way with little supervision. As a mitigation to this problem, we propose a novel Joint Image and Sequence Training protocol (JIST) that leverages large uncurated sets of images through a multi-task learning framework. With JIST we also introduce SeqGeM, an aggregation layer that revisits the popular GeM pooling to produce a single robust and compact embedding from a sequence of single-frame embeddings. We show that our model is able to outperform previous state of the art while being faster, using 8 times smaller descriptors, having a lighter architecture and allowing to process sequences of various lengths. Code is available at https://github.com/ga1i13o/JIST
- C. Masone and B. Caputo, “A survey on deep visual place recognition,” IEEE Access, vol. 9, pp. 19 516–19 547, 2021.
- K. A. Tsintotas, L. Bampis, and A. Gasteratos, “The revisiting problem in simultaneous localization and mapping: A survey on visual loop closure detection,” IEEE Trans. on Pattern Anal. and Mach. Intell., 2022.
- A. Angeli, S. Doncieux, J.-A. Meyer, and D. Filliat, “Real-time visual loop-closure detection,” in IEEE Int. Conf. on Robot. and Autom., 2008, pp. 1842–1847.
- N. Pion, M. Humenberger, G. Csurka, Y. Cabon, and T. Sattler, “Benchmarking image retrieval for visual localization,” in 2020 International Conference on 3D Vision (3DV), 2020, pp. 483–494.
- D. Yudin, Y. Solomentsev, R. Musaev, A. Staroverov, and A. I. Panov, “Hpointloc: Point-based indoor place recognition using synthetic rgb-d images,” in Neural Information Processing, 2023, pp. 471–484.
- F. Zeng, A. Jacobson, D. W. Smith, N. Boswell, T. Peynot, and M. Milford, “Enhancing underground visual place recognition with shannon entropy saliency,” in IEEE Int. Conf. on Robot. and Autom., 2017.
- C. Toft, W. Maddern, A. Torii, L. Hammarstrand, E. Stenborg, D. Safari, M. Okutomi, M. Pollefeys, J. Sivic, T. Pajdla, F. Kahl, and T. Sattler, “Long-term visual localization revisited,” IEEE Trans. on Pattern Anal. and Mach. Intell., 2020.
- F. Warburg, S. Hauberg, M. Lopez-Antequera, P. Gargallo, Y. Kuang, and J. Civera, “Mapillary street-level sequences: A dataset for lifelong place recognition,” in Conf. on Comput. Vis. and Pattern Recog., 2020, pp. 2623–2632.
- J. M. Fácil, D. Olid, L. Montesano, and J. Civera, “Condition-invariant multi-view place recognition,” ArXiv, vol. abs/1902.09516, 2019.
- S. Garg, B. Harwood, G. Anand, and M. Milford, “Delta descriptors: Change-based place representation for robust visual localization,” IEEE Robot. and Autom. Letters, vol. 5, no. 4, pp. 5120–5127, 2020.
- S. Garg and M. Milford, “SeqNet: Learning descriptors for sequence-based hierarchical place recognition,” IEEE Robot. and Autom. Letters, 2021.
- R. Mereu, G. Trivigno, G. Berton, C. Masone, and B. Caputo, “Learning sequential descriptors for sequence-based visual place recognition,” IEEE Robot. and Autom. Letters, vol. 7, no. 4, pp. 10 383–10 390, 2022.
- G. Berton, C. Masone, and B. Caputo, “Rethinking visual geo-localization for large-scale applications,” in Conf. on Comput. Vis. and Pattern Recog., 2022, pp. 4868–4878.
- A. Ali-bey, B. Chaib-draa, and P. Giguère, “Mixvpr: Feature mixing for visual place recognition,” in IEEE Wint. Conf. on Appl. of Comput. Vis., 2023, pp. 2998–3007.
- G. Berton, R. Mereu, G. Trivigno, C. Masone, G. Csurka, T. Sattler, and B. Caputo, “Deep visual geo-localization benchmark,” in Conf. on Comput. Vis. and Pattern Recog., 2022, pp. 5386–5397.
- J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman, “Object retrieval with large vocabularies and fast spatial matching.” in Conf. on Comput. Vis. and Pattern Recog., 2007, pp. 1–8.
- J. Martinez, S. Doubov, J. Fan, l. A. Bârsan, S. Wang, G. Máttyus, and R. Urtasun, “Pit30m: A benchmark for global localization in the age of self-driving cars,” in 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2020, pp. 4477–4484.
- F. Radenović, G. Tolias, and O. Chum, “Fine-tuning CNN Image Retrieval with No Human Annotation,” IEEE Trans. on Pattern Anal. and Mach. Intell., 2018.
- K. L. Ho and P. Newman, “Detecting loop closure with scene sequences,” Int. J. Comput. Vis., vol. 74, no. 3, pp. 261–286, 2007.
- M. J. Milford and G. F. Wyeth, “SeqSLAM: Visual route-based navigation for sunny summer days and stormy winter nights,” in IEEE Int. Conf. on Robot. and Autom., 2012, pp. 1643–1649.
- S. Schubert and P. Neubert, “What makes visual place recognition easy or hard?” ArXiv, vol. abs/2106.12671, 2021.
- T. Naseer, W. Burgard, and C. Stachniss, “Robust visual localization across seasons,” IEEE Transactions on Robotics, vol. 34, no. 2, pp. 289–302, 2018.
- O. Vysotska and C. Stachniss, “Lazy data association for image sequences matching under substantial appearance changes,” IEEE Robot. and Autom. Letters, vol. 1, no. 1, pp. 213–220, 2016.
- S. Schubert, P. Neubert, and P. Protzel, “Graph-based non-linear least squares optimization for visual place recognition in changing environments,” IEEE Robotics and Automation Letters, vol. 6, no. 2, pp. 811–818, 2021.
- S. Schubert, P. Neubert, and Protzel, “Fast and memory efficient graph optimization via icm for visual place recognition.” in Robotics: Science and Systems, vol. 73, 2021, pp. 1842–1847.
- S. Garg, M. Vankadari, and M. Milford, “SeqMatchNet: Contrastive learning with sequence matching for place recognition & relocalization,” in CoRL, ser. Proc. Mach. Learn. Res., vol. 164. PMLR, 2022, pp. 429–443.
- P. Neubert and S. Schubert, “Hyperdimensional computing as a framework for systematic aggregation of image descriptors,” in 2021 Conf. on Comput. Vis. and Pattern Recog., 2021, pp. 16 933–16 942.
- F. Zhang, J. Zhao, Y. Cai, G. Tian, W. Mu, and C. Ye, “Learning sequence descriptor based on spatiotemporal attention for visual place recognition,” arXiv preprint arXiv:2305.11467, July 2023.
- R. Arandjelović, P. Gronat, A. Torii, T. Pajdla, and J. Sivic, “NetVLAD: CNN architecture for weakly supervised place recognition,” IEEE Trans. on Pattern Anal. and Mach. Intell., vol. 40, no. 6, pp. 1437–1451, 2018.
- A. Ali-bey, B. Chaib-draa, and P. Giguère, “Gsv-cities: Toward appropriate supervised visual place recognition,” Neurocomputing, 2022.
- M. Leyva-Vallina, N. Strisciuglio, and N. Petkov, “Data-efficient large scale place recognition with graded similarity supervision,” Conf. on Comput. Vis. and Pattern Recog., pp. 23 487–23 496, 2023.
- H. Wang, Y. Wang, Z. Zhou, X. Ji, D. Gong, J. Zhou, Z. Li, and W. Liu, “Cosface: Large margin cosine loss for deep face recognition,” in Conf. on Comput. Vis. and Pattern Recog., 2018, pp. 5265–5274.
- A. Hassani, S. Walton, N. Shah, A. Abuduweili, J. Li, and H. Shi, “Escaping the big data paradigm with compact transformers,” CoRR, vol. abs/2104.05704, 2021.
- G. Bertasius, H. Wang, and L. Torresani, “Is space-time attention all you need for video understanding?” in Int. Conf. Mach. Learn., 2021, pp. 813–824.
- W. Maddern, G. Pascoe, C. Linegar, and P. Newman, “1 Year, 1000km: The Oxford RobotCar Dataset,” The Int. J. of Robot. Research, 2017.
- D. Kingma and J. Ba, “Adam: A method for stochastic optimization,” Int. Conf. Learn. Represent., 2014.
- K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Conf. on Comput. Vis. and Pattern Recog., 2016, pp. 770–778.
- H. Jégou, M. Douze, and C. Schmid, “Product quantization for nearest neighbor search.” IEEE Trans. on Pattern Anal. and Mach. Intell., vol. 33, no. 1, pp. 117–128, 2011.
- A. Babenko and V. S. Lempitsky, “The inverted multi-index.” in Conf. on Comput. Vis. and Pattern Recog. IEEE Computer Society, 2012, pp. 3069–3076.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.