Papers
Topics
Authors
Recent
Search
2000 character limit reached

Open Set Semantic Mapping

Updated 10 February 2026
  • Open set semantic mapping is a method that constructs spatial maps with open-vocabulary labels instead of fixed classes, allowing zero-shot recognition of novel objects.
  • It integrates multi-view semantic fusion with vision–language models and foundation embeddings to accurately segment and label both known and unseen entities.
  • The approach supports efficient spatial reasoning in robotics by enabling runtime, query-based mapping and adaptive scene interpretation without retraining.

Open set semantic mapping denotes the construction of a spatial or object-level world representation in which semantic labels are not restricted to a fixed category set defined at training. Instead, the mapping process leverages open-vocabulary segmentation, vision–LLMs, or foundation model embeddings to recognize, segment, and encode both previously seen and truly novel, user-specified categories without retraining, enabling robot agents and interactive systems to operate in previously unseen environments and handle concepts beyond closed-set limits (Yang et al., 3 Mar 2025, Popov et al., 13 Mar 2025, Jatavallabhula et al., 2023, Sheppard et al., 15 Dec 2025, Maggio et al., 7 Mar 2025, Loo et al., 2024, Günther et al., 3 Feb 2026, Xie et al., 17 Jul 2025, Yoo et al., 9 Dec 2025, Alama et al., 9 Apr 2025, Singh et al., 2024, Singh et al., 2024). Unlike closed-set approaches, which hard-code class vocabularies into network weights and map structures, open-set semantic mapping exposes explicit, extensible semantics in map elements (voxels, Gaussians, surfels, graph nodes), enabling run-time, zero-shot queries and high-level symbolic reasoning for robotics and spatial AI.

1. Problem Definition, Motivation, and Challenges

Open set semantic mapping aims to endow autonomous agents with the ability to construct spatial representations where the set of semantic concepts is not restricted to a pre-specified label set C={c1,…,cC}C = \{c_1, \ldots, c_C\} but is effectively unbounded. Formally, each map element may be tagged with an open-vocabulary label or a high-dimensional embedding fk∈Rdf_k \in \mathbb{R}^d, supporting arbitrary zero-shot queries via similarity search, prompt-based segmentation, or LLM-driven reasoning (Jatavallabhula et al., 2023, Maggio et al., 7 Mar 2025, Popov et al., 13 Mar 2025, Sheppard et al., 15 Dec 2025, Yoo et al., 9 Dec 2025, Xie et al., 17 Jul 2025, Singh et al., 2024). This setting is motivated by fundamental limitations in closed-set mapping: robots cannot handle novel objects, long-tail classes, or rapidly changing environments without incurring costly retraining or manual annotation.

Key challenges in open-set semantic mapping include:

2. Fundamental Representations and Semantic Storage

Open-set semantic mapping systems employ a range of representations to encode geometry and semantics:

Method/Framework Map Element Semantic Representation Reference
ConceptFusion point/surfel CLIP/embedding vector fkf_k (Jatavallabhula et al., 2023)
OpenGS-SLAM, Bayesian Fields, OpenMonoGS-SLAM 3D Gaussians explicit open-vocab label or feature (Yang et al., 3 Mar 2025, Maggio et al., 7 Mar 2025, Yoo et al., 9 Dec 2025)
SLIM-VDB Voxel Dirichlet (closed)/NIG (open) prior (Sheppard et al., 15 Dec 2025)
RayFronts Voxel + ray language-aligned feature vector (Alama et al., 9 Apr 2025)
Scene Graph Backed Graph node/edge CLIP/DINO feature, open label text (Günther et al., 3 Feb 2026)
LOSS-SLAM, Open-Set Loop Closure Sparse object node DINO/MLP descriptor, uncertainty (Singh et al., 2024, Singh et al., 2024)
OSG, osmAG-LLM Graph node open-vocab label, natural lang. desc (Loo et al., 2024, Xie et al., 17 Jul 2025)

Representations are grouped into dense volumetric/fusion (voxel, surfel, Gaussian splat) and sparse, relational (scene graph, object node) structures. Geometry is recovered via standard volumetric fusion, point-based mapping, or recent 3D Gaussian Splatting (3DGS) techniques (Yang et al., 3 Mar 2025, Maggio et al., 7 Mar 2025, Yoo et al., 9 Dec 2025). Semantic attributes are either explicit integer labels from open-vocabulary detectors (e.g., YOLO-World, SAM), free-form language embeddings (e.g., CLIP, DINO), or probabilistic densities whose support grows at run time (Sheppard et al., 15 Dec 2025, Alama et al., 9 Apr 2025, Jatavallabhula et al., 2023).

3. Core Algorithmic Building Blocks

Common algorithmic modules across the open-set semantic mapping literature include:

4. Scene Graphs, Relational Structure, and Symbolic Interfaces

Open-set semantic maps increasingly expose their internal state as explicitly structured, symbolic data—scene graphs—for downstream spatial reasoning:

  • 3D Semantic Scene Graphs (3DSSGs): Serve as the live backend, fusing geometric, semantic, and relational data. Nodes represent objects, places, frames; edges encode adjacency, containment, or semantic relations (Günther et al., 3 Feb 2026, Loo et al., 2024, Xie et al., 17 Jul 2025). Open-vocabulary labels and high-dimensional features are maintained per node for extension.
  • Incremental Refinement and Data Association: Each observation can yield new nodes, merges, or updated relations, with data association carried out via spatial proximity, IoU, and feature-space similarity (e.g., DINO, CLIP) and Bayesian models (Günther et al., 3 Feb 2026, Yang et al., 3 Mar 2025).
  • Hierarchical and Layered Graphs: Multi-level graphs represent not just "things" but region, place, or abstraction layers (room, floor, building), linked by configurable edge types (is_near, contains, connects_to) (Loo et al., 2024).
  • Semantic Uncertainty and Open-Set Graph Matching: Graph-based loop closure and object association explicitly incorporate feature uncertainty and support both underwater and terrestrial deployment (Singh et al., 2024).
  • Symbolic/LLM Integration: High-level reasoning modules (e.g., LLM planners or VQA systems) interface directly with open set semantic graphs, enabling spatial question answering, navigation, and zero-shot object retrieval by reasoning over open-vocabulary entities (Loo et al., 2024, Xie et al., 17 Jul 2025, Popov et al., 13 Mar 2025).

5. Quantitative Evaluation and Benchmarking

Open-set semantic mapping performance is quantitatively measured via:

6. Empirical Results, Limitations, and Future Research

Empirical studies consistently show that open-set semantic mapping leads to:

Key limitations and failure modes include:

  • Feature-space and prompt sensitivity: Performance hinges on the coverage and alignment of foundation model embedding spaces (e.g., CLIP, DINO) and may be affected by object occlusion, view angle, or prompt ambiguity (Maggio et al., 7 Mar 2025, Jatavallabhula et al., 2023).
  • Clustering and segmentation issues: Over/under-segmentation, hyperparameter sensitivity, and spurious splitting may degrade object-level granularity (Maggio et al., 7 Mar 2025, Nanwani et al., 2023, Singh et al., 2024).
  • Memory and compute trade-offs: Though substantial improvements are achieved, dense embedding maps remain memory intensive for large environments unless appropriately pruned or compressed (Sheppard et al., 15 Dec 2025, Alama et al., 9 Apr 2025).
  • Lighting/Environmental robustness: Scene segmentation can be impacted by difficult photometric conditions or dynamic contents, motivating development of photometric-invariant and temporally consistent mapping approaches (Popov et al., 13 Mar 2025).

Future research directions highlighted include hybrid 2D–3D fusion architectures, explicit "unknown" class detection and quantification, scene graph-based planning, open-set object discovery and loop-closure, and end-to-end co-training of geometry and semantics under variable sensor and environmental conditions (Popov et al., 13 Mar 2025, Maggio et al., 7 Mar 2025, Loo et al., 2024, Günther et al., 3 Feb 2026, Yoo et al., 9 Dec 2025).

7. Principal Frameworks and Systematic Taxonomy

Major recent frameworks exemplifying state-of-the-art open set semantic mapping methodologies include:

  • OpenGS-SLAM: Dense semantic SLAM based on 3D Gaussian Splatting, open-vocabulary 2D foundation model integration, explicit label voting and consensus, and segmentation pruning for high efficiency and accuracy (Yang et al., 3 Mar 2025).
  • ConceptFusion: Pixel-aligned, multimodal zero-shot feature fusion with open-vocabulary querying for complex environments (Jatavallabhula et al., 2023).
  • SLIM-VDB: Probabilistic Bayesian fusion over sparse volumetric OpenVDB grids with Dirichlet and Normal–Inverse-Gamma semantic priors, supporting unbounded label insertion (Sheppard et al., 15 Dec 2025).
  • Bayesian Fields: Task-driven semantic mapping using probabilistic multi-view fusion and information bottleneck-based clustering to yield adaptive object granularity (Maggio et al., 7 Mar 2025).
  • RayFronts: Cooperative in-range/out-of-range semantic mapping via fused voxels and "semantic ray frontiers", permitting dense local and exploratory global inference (Alama et al., 9 Apr 2025).
  • Scene Graph Backed 3DSSG: Online, persistent 3D semantic scene graphs as the live backend for efficient symbolic reasoning and hierarchical place-object semantics (Günther et al., 3 Feb 2026).
  • LOSS-SLAM and Open-Set Loop Closure: Lightweight, factor graph-based approaches for object-level SLAM, open-set data association, and uncertainty-aware object discovery (Singh et al., 2024, Singh et al., 2024).
  • Open Scene Graphs (OSG) and osmAG-LLM: Topo-semantic graphs enabling integration with LLMs for open-world object-goal navigation and zero-shot spatial reasoning (Loo et al., 2024, Xie et al., 17 Jul 2025).

These systems illustrate the current convergence of geometric mapping, open vocabulary semantic fusion, real-time symbolic knowledge graphs, and LLM-enabled reasoning in scalable, robust open-set semantic mapping.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Open Set Semantic Mapping.