vMAP: Vectorised Object Mapping for Neural Field SLAM

Published 3 Feb 2023 in cs.CV | (2302.01838v2)

Abstract: We present vMAP, an object-level dense SLAM system using neural field representations. Each object is represented by a small MLP, enabling efficient, watertight object modelling without the need for 3D priors. As an RGB-D camera browses a scene with no prior information, vMAP detects object instances on-the-fly, and dynamically adds them to its map. Specifically, thanks to the power of vectorised training, vMAP can optimise as many as 50 individual objects in a single scene, with an extremely efficient training speed of 5Hz map update. We experimentally demonstrate significantly improved scene-level and object-level reconstruction quality compared to prior neural field SLAM systems. Project page: https://kxhit.github.io/vMAP.

Abstract PDF Upgrade to Chat

Citations (70)

View on Semantic Scholar

Summary

The paper presents a novel object-level dense SLAM that models each object with its own MLP for precise, watertight reconstructions without relying on 3D shape priors.
The paper implements vectorised training for up to 50 object models on GPU, achieving a map update speed of 5Hz while maintaining high reconstruction quality.
The paper demonstrates robust performance with 50-70% improved object completion metrics in real-world and simulated environments using datasets like Replica, ScanNet, and TUM RGB-D.

Overview of "vMAP: Vectorised Object Mapping for Neural Field SLAM"

The paper "vMAP: Vectorised Object Mapping for Neural Field SLAM" introduces a novel object-level dense SLAM system, leveraging advances in neural field representations. This system is capable of efficiently modeling individual objects as separate small multilayer perceptrons (MLPs), allowing for watertight object reconstruction without relying on pre-existing 3D shape priors.

Key Contributions

Object-Level Representation: vMAP independently models each object with its own MLP, allowing for individualized attention to semantic entities within a scene. This approach facilitates efficient computation and memory usage, offering precise control over object reconstruction.
Vectorised Training: A significant contribution is the implementation of vectorised training, which optimizes the parallel processing of up to 50 object models on a GPU. This is achieved without degrading the reconstruction quality, reaching a map update speed of 5Hz. The system notably outperforms prior neural field SLAM systems in terms of scene and object reconstruction quality.
No 3D Priors Required: Unlike traditional methods that rely on 3D shape priors such as CAD models, vMAP leverages the capacity of MLPs to generate watertight reconstructions even when objects are only partially observed, a capability demonstrated in both simulated and real-world environments.
Disentangled Object Representation: The paper emphasizes the advantages of a disentangled object representation, which permits dynamic scene composition and independent object manipulation within the mapping framework.

Experimental Validation

The paper presents extensive experimental results showcasing vMAP’s superiority over existing SLAM systems, such as iMAP and NICE-SLAM. Numerical results demonstrate that vMAP achieves substantially improved object completion metrics, e.g., 50-70% better object-level completion compared to other systems when evaluated on the Replica dataset.

The evaluations also include:

Scene and Object Reconstruction: vMAP’s ability to maintain high-quality reconstruction even in noisy real-world conditions was validated, particularly using the ScanNet and TUM RGB-D datasets.
Efficient Runtime and Memory Usage: vMAP minimizes computational resources while delivering rapid training speeds, a crucial factor for real-time applications.

Implications and Future Prospects

The introduction of vMAP marks an important step toward more sophisticated real-time scene understanding systems, particularly in robotics and augmented reality, where accurate and fast object-level SLAM is invaluable. By eliminating the reliance on 3D shape priors, vMAP can be employed in environments with unknown or novel objects, enhancing its robustness and adaptability.

Speculatively, future developments could integrate more advanced object segmentation techniques and adaptive MLP configurations to further refine the accuracy and efficiency of vMAP. The potential integration with monocular systems or incremental improvements in vectorised training algorithms could also be areas of promising research exploration.

In summary, vMAP significantly advances the field of neural field SLAM systems through its innovative approach to object-level reconstruction, providing a flexible and efficient platform that could transform interactive vision applications and technologies.