OpenNavMap: Structure-Free Topometric Mapping via Large-Scale Collaborative Localization

Published 18 Jan 2026 in cs.RO and cs.CV | (2601.12291v1)

Abstract: Scalable and maintainable map representations are fundamental to enabling large-scale visual navigation and facilitating the deployment of robots in real-world environments. While collaborative localization across multi-session mapping enhances efficiency, traditional structure-based methods struggle with high maintenance costs and fail in feature-less environments or under significant viewpoint changes typical of crowd-sourced data. To address this, we propose OPENNAVMAP, a lightweight, structure-free topometric system leveraging 3D geometric foundation models for on-demand reconstruction. Our method unifies dynamic programming-based sequence matching, geometric verification, and confidence-calibrated optimization to robust, coarse-to-fine submap alignment without requiring pre-built 3D models. Evaluations on the Map-Free benchmark demonstrate superior accuracy over structure-from-motion and regression baselines, achieving an average translation error of 0.62m. Furthermore, the system maintains global consistency across 15km of multi-session data with an absolute trajectory error below 3m for map merging. Finally, we validate practical utility through 12 successful autonomous image-goal navigation tasks on simulated and physical robots. Code and datasets will be publicly available in https://rpl-cs-ucl.github.io/OpenNavMap_page.

Abstract PDF Upgrade to Chat

Summary

The paper proposes a novel mapping approach that leverages foundation models and hierarchical graphs to achieve robust, scalable localization.
It integrates dynamic programming-based sequence matching with confidence-calibrated metric optimization for accurate cross-device mapping.
Empirical results demonstrate sub-meter accuracy and an order-of-magnitude reduction in storage compared to traditional structure-based maps.

OpenNavMap: A Structure-Free Topometric Mapping System for Large-Scale Collaborative Localization

Introduction and Motivation

OpenNavMap addresses critical scalability and robustness limitations in large-scale visual navigation (VNav) by proposing a structure-free, topometric mapping paradigm tailored for collaborative, cross-device multi-session mapping. Unlike conventional structure-based approaches, which maintain explicit and dense 3D reconstructions for localization, OpenNavMap utilizes the representational power of 3D geometric foundation models (GFMs) to perform on-demand inference of scene geometry and camera parameters, leveraging a lightweight graph structure where nodes store RGB images and associated visual/topometric metadata.

This shift enables significant reduction in map maintenance complexity and computational overhead, accommodates the heterogeneity of modern data sources (e.g., smartphones, AR glasses, street-view panoramas), and maintains robustness even in feature-poor environments or under significant temporal/viewpoint shifts. OpenNavMap is underpinned by a hierarchical integration of dynamic programming-based sequence matching, geometric verification, and confidence-calibrated optimization, making it agnostic to pre-built 3D metric maps and thus scalable for lifelong, crowd-sourced, or robot-captured mapping deployments.

Core System Architecture

The system models environments as a three-layer topometric graph: the Covisibility Graph (CvG), Odometry Graph (OdG), and Traversability Graph (TrG). Nodes in these graphs represent image observations, odometric poses, or traversable spatial locations, with multi-modal, multi-temporal attributes that support robust localization, global consistency, and real-time path planning. Data acquisition is distributed and device-agnostic, supporting keyframe-based map construction from RGB(-D), egocentric, vehicular, or panoramic platforms with varying intrinsic parameters.

Collaborative Localization Pipeline

Collaborative localization and map merging are formulated as a hierarchical pipeline:

Topological Localization uses discriminative global image descriptors and a dynamic programming (DP) sequence matcher to efficiently establish candidate loop closures even for disjoint or irregularly overlapping trajectories. Geometric verification (GV) based on feature matches and RANSAC discards false positives, ensuring high precision and recall in data association.
Metric Localization utilizes a GFM (MASt3R) to predict dense, scale-aware pointmaps and per-pixel confidence maps for query-reference image pairs. A global optimization framework then aligns these predictions, explicitly calibrating confidence maps via robust loss (e.g., Geman-McClure kernel within IRLS), yielding consistent pose, scale, and geometry for multi-view consistency. This confidence-calibrated map (CCM) provides a principled way to reject unreliable loop closures.
Pose Graph Optimization (PGO) integrates intra-map odometry and inter-map metric constraints in a global factor graph, robustly optimizing the joint pose set for all submaps, leveraging the calibrated covariance of loop closure edges.
Figure 2: The raw and calibrated confidence maps of the query image after global alignment; confidence values are adaptively down-weighted in regions of high residual error.

Cross-Device and Heterogeneous Mapping

The architecture natively supports cross-device data fusion via pre-processing and IQA-based filtering. Virtual perspective projections from equirectangular panoramas and undistortion routines ensure geometric consistency for foundation model inference. A learned MOSIQ-based image quality assessment (IQA) filters perceptually degraded inputs. This enables robust operation in both crowd-sourced and autonomous multi-platform contexts, removing dependency on calibrated, high-end sensing arrays.

Lifelong Map Maintenance via Node Culling and Connectivity Augmentation

As maps grow, a probabilistic node culling strategy operates on information contribution metrics that combine image quality, temporal recency, and geometric novelty (information gain). Nodes offering little marginal value are selectively culled post-PGO, maintaining a compact but informative map topology suitable for long-term operation. Edge augmentation dynamically densifies the CvG and TrG as new evidence of spatial or covisible connectivity emerges.

Numerical Results and Empirical Analysis

Localization and Mapping Benchmarks

Topological Localization: On the self-collected dataset spanning $>18$ km across varied environments, CosPlace and the proposed DP-based sequence matching with GV achieve top-1 precision/recall exceeding 86%/75%, outperforming both traditional (SeqSLAM, NetVLAD) and deep classification-based place recognition models under large viewpoint and temporal variation.

Metric Localization: On Map-Free, GZ-Campus, and self-collected datasets, OpenNavMap’s GFM-based localization consistently reaches sub-meter accuracy (0.62 m average translation error on Map-Free, ATE $<$ 3 m on large-scale map merging, $>80\%$ correct within $1$ m/ $10^\circ$ using just two references). Structure-based pipelines (COLMAP/HLoc) degrade significantly under view sparsity or long baselines where triangulation is poorly conditioned, requiring $>5$ references for similar accuracy.

Figure 4: Estimated multi-session trajectory by OpenNavMap with GT from Aria Glasses; absolute trajectory error remains $<$ 3 m over a $15$ km dataset.

Map Size: The image-based, structure-free map requires an order of magnitude less storage than explicit 3D maps, e.g., $1.3$ MB per scene versus $20$ MB for feature descriptor-based maps at equivalent coverage.

Heterogeneous and Lifelong Mapping

On the $360$Loc dataset (cross-device, panoramic to pinhole conversion), translational/rotational ATE remains around $1$ m/ $1^\circ$ , demonstrating resilient merging despite multi-session, multi-intrinsic, and GPS-denied mapping. On self-collected, multi-platform maps ( $>3.5$ months, heterogeneous devices and street-view panoramas), the system integrates global path planning and visual navigation even to images never observed by the robot, with cumulative maps covering up to $15.7$ km.

Twelve autonomous image-goal navigation tasks using OpenNavMap as the spatial backbone demonstrate practical viability: robots navigate point-to-point using only visual topometry, without external metric priors, sustaining robust operation indoor, outdoor, and under significant environmental/illumination changes.

Implications and Future Prospects

From a theoretical perspective, OpenNavMap redefines the representational trade-off between map compactness and metric accuracy by decoupling dense 3D structure storage from localization and planning capability. Leveraging GFMs mitigates the limitations of geometric feature sparsity, permits scalable crowd-sourced mapping, and abstracts hardware-specific calibration.

Practically, the topometric paradigm, where image nodes and traversability/covisibility/odometry edges constitute the operational backbone, enables real-time navigation and planning in unprepared, dynamic, or GPS-denied spaces with limited resource requirements. The cross-device compatibility and probabilistic culling ensure compactness and adaptability for lifelong autonomy, supporting continuous data integration from arbitrary user sources.

Challenges remain in scaling foundation model inference for real-time embedded applications, improving adaptation under extreme cross-device domain shifts, and decentralizing collaborative mapping protocols. Future research directions include explicit exploitation of underlying graph topology for non-sequential matching, enhancement of GFM robustness, and distributed map maintenance enabling truly global lifelong mapping.

Conclusion

OpenNavMap establishes a new baseline for large-scale, collaborative, structure-free visual navigation and mapping. By combining sparse, image-indexed topometric maps with on-demand 3D scene inference via foundation models, and integrating dynamic, robust data association, the system matches or exceeds prior state-of-the-art structure-based pipelines on both metric accuracy and operational scalability—with significantly reduced storage and computational cost. This paradigm is well-aligned with the increasing heterogeneity and ubiquity of modern visual mapping platforms, supporting robust robotic perception and navigation in complex, changing, and unstructured environments.

(Figures were included where essential to illustrate the confidence calibration and empirical trajectory quality. The essay references the salient numerical results and implementation details critical to evaluating the robustness and scalability of OpenNavMap.)

Markdown Report Issue