Leveraging Neural Radiance Field in Descriptor Synthesis for Keypoints Scene Coordinate Regression
Abstract: Classical structural-based visual localization methods offer high accuracy but face trade-offs in terms of storage, speed, and privacy. A recent innovation, keypoint scene coordinate regression (KSCR) named D2S addresses these issues by leveraging graph attention networks to enhance keypoint relationships and predict their 3D coordinates using a simple multilayer perceptron (MLP). Camera pose is then determined via PnP+RANSAC, using established 2D-3D correspondences. While KSCR achieves competitive results, rivaling state-of-the-art image-retrieval methods like HLoc across multiple benchmarks, its performance is hindered when data samples are limited due to the deep learning model's reliance on extensive data. This paper proposes a solution to this challenge by introducing a pipeline for keypoint descriptor synthesis using Neural Radiance Field (NeRF). By generating novel poses and feeding them into a trained NeRF model to create new views, our approach enhances the KSCR's generalization capabilities in data-scarce environments. The proposed system could significantly improve localization accuracy by up to 50% and cost only a fraction of time for data synthesis. Furthermore, its modular design allows for the integration of multiple NeRFs, offering a versatile and efficient solution for visual localization. The implementation is publicly available at: https://github.com/ais-lab/DescriptorSynthesis4Feat2Map.
- J. L. Schonberger and J.-M. Frahm, “Structure-from-motion revisited,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 4104–4113.
- B.-T. Bui, D.-T. Tran, and J.-H. Lee, “D2S: Representing local descriptors and global scene coordinates for camera relocalization,” Dec. 2023, arXiv:2307.15250 [cs].
- A. Kendall, M. Grimes, and R. Cipolla, “Posenet: A convolutional network for real-time 6-dof camera relocalization,” in Proceedings of the IEEE international conference on computer vision, 2015, pp. 2938–2946.
- T. B. Bach, T. T. Dinh, and J.-H. Lee, “FeatLoc: Absolute pose regressor for indoor 2D sparse features with simplistic view synthesizing,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 189, pp. 50–62, July 2022.
- L. Zhou, Z. Luo, T. Shen, J. Zhang, M. Zhen, Y. Yao, T. Fang, and L. Quan, “Kfnet: Learning temporal camera relocalization using kalman filtering,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 4919–4928.
- Q. Zhou, T. Sattler, M. Pollefeys, and L. Leal-Taixe, “To learn or not to learn: Visual localization from essential matrices,” in 2020 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2020, pp. 3319–3326.
- E. Brachmann, A. Krull, S. Nowozin, J. Shotton, F. Michel, S. Gumhold, and C. Rother, “Dsac-differentiable ransac for camera localization,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 6684–6692.
- E. Brachmann and C. Rother, “Learning less is more-6d camera localization via 3d surface regression,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 4654–4662.
- X. Li, S. Wang, Y. Zhao, J. Verbeek, and J. Kannala, “Hierarchical scene coordinate classification and regression for visual localization,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 11 983–11 992.
- S. Dong, S. Wang, Y. Zhuang, J. Kannala, M. Pollefeys, and B. Chen, “Visual localization via few-shot scene region classification,” in 2022 International Conference on 3D Vision (3DV). IEEE, 2022, pp. 393–402.
- Z. Kukelova, M. Bujnak, and T. Pajdla, “Real-time solution to the absolute pose problem with unknown radial distortion and focal length,” in Proceedings of the IEEE International Conference on Computer Vision, 2013, pp. 2816–2823.
- P.-E. Sarlin, C. Cadena, R. Siegwart, and M. Dymczyk, “From coarse to fine: Robust hierarchical localization at large scale,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 12 716–12 725.
- B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoorthi, and R. Ng, “Nerf: Representing scenes as neural radiance fields for view synthesis,” Communications of the ACM, vol. 65, no. 1, pp. 99–106, 2021.
- A. Moreau, N. Piasco, D. Tsishkou, B. Stanciulescu, and A. de La Fortelle, “Lens: Localization enhanced by nerf synthesis,” in Conference on Robot Learning. PMLR, 2022, pp. 1347–1356.
- M. Tancik, E. Weber, E. Ng, R. Li, B. Yi, J. Kerr, T. Wang, A. Kristoffersen, J. Austin, K. Salahi, A. Ahuja, D. McAllister, and A. Kanazawa, “Nerfstudio: A Modular Framework for Neural Radiance Field Development,” in Special Interest Group on Computer Graphics and Interactive Techniques Conference Conference Proceedings, July 2023, pp. 1–12, arXiv:2302.04264 [cs].
- D. DeTone, T. Malisiewicz, and A. Rabinovich, “SuperPoint: Self-Supervised Interest Point Detection and Description,” Apr. 2018, arXiv:1712.07629 [cs].
- P. Lindenberger, P.-E. Sarlin, and M. Pollefeys, “Lightglue: Local feature matching at light speed,” arXiv preprint arXiv:2306.13643, 2023.
- J. L. Schönberger, E. Zheng, J.-M. Frahm, and M. Pollefeys, “Pixelwise view selection for unstructured multi-view stereo,” in Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part III 14. Springer, 2016, pp. 501–518.
- M. Tyszkiewicz, P. Fua, and E. Trulls, “Disk: Learning local features with policy gradient,” Advances in Neural Information Processing Systems, vol. 33, pp. 14 254–14 265, 2020.
- M. Dusmanu, I. Rocco, T. Pajdla, M. Pollefeys, J. Sivic, A. Torii, and T. Sattler, “D2-net: A trainable cnn for joint detection and description of local features,” arXiv preprint arXiv:1905.03561, 2019.
- J. Revaud, P. Weinzaepfel, C. De Souza, N. Pion, G. Csurka, Y. Cabon, and M. Humenberger, “R2d2: repeatable and reliable detector and descriptor,” arXiv preprint arXiv:1906.06195, 2019.
- R. Arandjelovic, P. Gronat, A. Torii, T. Pajdla, and J. Sivic, “Netvlad: Cnn architecture for weakly supervised place recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 5297–5307.
- A. Gordo, J. Almazan, J. Revaud, and D. Larlus, “End-to-end learning of deep visual representations for image retrieval,” International Journal of Computer Vision, vol. 124, no. 2, pp. 237–254, 2017.
- P.-E. Sarlin, D. DeTone, T. Malisiewicz, and A. Rabinovich, “Superglue: Learning feature matching with graph neural networks,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 4938–4947.
- A. Bergamo, S. N. Sinha, and L. Torresani, “Leveraging structure from motion to learn discriminative codebooks for scalable landmark classification,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2013, pp. 763–770.
- S. Brahmbhatt, J. Gu, K. Kim, J. Hays, and J. Kautz, “Geometry-aware learning of maps for camera localization,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 2616–2625.
- B. Wang, C. Chen, C. X. Lu, P. Zhao, N. Trigoni, and A. Markham, “Atloc: Attention guided camera localization,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 06, 2020, pp. 10 393–10 401.
- T. Sattler, Q. Zhou, M. Pollefeys, and L. Leal-Taixe, “Understanding the limitations of cnn-based absolute camera pose regression,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 3302–3312.
- T. Ng, A. Lopez-Rodriguez, V. Balntas, and K. Mikolajczyk, “Reassessing the limitations of cnn methods for camera pose regression,” arXiv preprint arXiv:2108.07260, 2021.
- E. Brachmann and C. Rother, “Visual camera re-localization from RGB and RGB-D images using DSAC,” TPAMI, 2021.
- F. Pittaluga, S. J. Koppal, S. B. Kang, and S. N. Sinha, “Revealing Scenes by Inverting Structure from Motion Reconstructions,” Apr. 2019, arXiv:1904.03303 [cs].
- J. Zhang, S. Tang, K. Qiu, R. Huang, C. Fang, L. Cui, Z. Dong, S. Zhu, and P. Tan, “Rendernet: Visual relocalization using virtual viewpoints in large-scale indoor environments,” arXiv preprint arXiv:2207.12579, 2022.
- K. Liu, Q. Li, and G. Qiu, “Posegan: A pose-to-image translation framework for camera localization,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 166, pp. 308–315, 2020.
- L. Chen, W. Chen, R. Wang, and M. Pollefeys, “Leveraging neural radiance fields for uncertainty-aware visual localization,” arXiv preprint arXiv:2310.06984, 2023.
- J. T. Barron, B. Mildenhall, M. Tancik, P. Hedman, R. Martin-Brualla, and P. P. Srinivasan, “Mip-NeRF: A Multiscale Representation for Anti-Aliasing Neural Radiance Fields,” in 2021 IEEE/CVF International Conference on Computer Vision (ICCV). Montreal, QC, Canada: IEEE, Oct. 2021, pp. 5835–5844.
- J. T. Barron, B. Mildenhall, D. Verbin, P. P. Srinivasan, and P. Hedman, “Mip-nerf 360: Unbounded anti-aliased neural radiance fields,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 5470–5479.
- T. Müller, A. Evans, C. Schied, and A. Keller, “Instant Neural Graphics Primitives with a Multiresolution Hash Encoding,” ACM Transactions on Graphics, vol. 41, no. 4, pp. 1–15, July 2022, arXiv:2201.05989 [cs].
- Z. Wang, S. Wu, W. Xie, M. Chen, and V. A. Prisacariu, “Nerf–: Neural radiance fields without known camera parameters,” arXiv preprint arXiv:2102.07064, 2021.
- J. Shotton, B. Glocker, C. Zach, S. Izadi, A. Criminisi, and A. Fitzgibbon, “Scene coordinate regression forests for camera relocalization in rgb-d images,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2013, pp. 2930–2937.
- J. Valentin, A. Dai, M. Nießner, P. Kohli, P. Torr, S. Izadi, and C. Keskin, “Learning to navigate the energy landscape,” in 2016 Fourth International Conference on 3D Vision (3DV). IEEE, 2016, pp. 323–332.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.