Markerless Robot Detection and 6D Pose Estimation for Multi-Agent SLAM

Published 18 Feb 2026 in cs.RO | (2602.16308v1)

Abstract: The capability of multi-robot SLAM approaches to merge localization history and maps from different observers is often challenged by the difficulty in establishing data association. Loop closure detection between perceptual inputs of different robotic agents is easily compromised in the context of perceptual aliasing, or when perspectives differ significantly. For this reason, direct mutual observation among robots is a powerful way to connect partial SLAM graphs, but often relies on the presence of calibrated arrays of fiducial markers (e.g., AprilTag arrays), which severely limits the range of observations and frequently fails under sharp lighting conditions, e.g., reflections or overexposure. In this work, we propose a novel solution to this problem leveraging recent advances in Deep-Learning-based 6D pose estimation. We feature markerless pose estimation as part of a decentralized multi-robot SLAM system and demonstrate the benefit to the relative localization accuracy among the robotic team. The solution is validated experimentally on data recorded in a test field campaign on a planetary analogous environment.

Abstract PDF Upgrade to Chat

Summary

The paper introduces a deep learning-based markerless detection and 6D pose estimation approach that overcomes the limitations of traditional fiducial marker methods.
It leverages YOLO v7, a transformer-based neural network, and stereo visual inputs to enhance data association and localization in decentralized multi-robot SLAM systems.
Experimental validations in synthetic and real-world planetary-analog environments demonstrate significant improvements in detection range, accuracy, and overall localization performance.

Markerless Robot Detection and 6D Pose Estimation for Multi-Agent SLAM

Introduction

The paper "Markerless Robot Detection and 6D Pose Estimation for Multi-Agent SLAM" (2602.16308) introduces an innovative approach to solving the data association challenges in multi-robot SLAM systems. Traditional SLAM implementations rely heavily on fiducial markers like AprilTags for mutual observer detection, which limits observation range and renders the system ineffective under challenging lighting conditions. This paper proposes leveraging deep learning techniques for markerless detection and pose estimation, enriching the SLAM system with the ability to accurately localize robots within a team. Experimental validations were conducted in planetary analogous environments to highlight the system's efficacy in enhancing relative localization accuracy amongst agents.

Decentralized SLAM System

The study builds upon a decentralized multi-robot SLAM system, optimized for stereo vision-equipped robots like the Lightweight Rover Unit (LRU) and UAVs like ARDEA. Visual Odometry (VO), IMU measurements, and odometry sources are fused to compute local state estimation using a Local Reference Filter, allowing for environmental partitioning into submaps. Submap matching facilitates the formation of visual loop closures by registering overlapping submaps. This decentralized architecture supports inter-robot pose measurements derived from visual detections and robust estimation of SLAM graph constraints across agents.

Figure 1: Schematic overview of the employed decentralized SLAM system, focusing on multi-robot detection capabilities.

Markerless Detection and Pose Estimation Methods

This work features a markerless robotic detection approach using the YOLO v7 object detector for accurate identification, followed by a pose estimation method inspired by Ulmer et al., focusing on 6D pose estimation with dense 2D-to-3D correspondence predictors. Adjustments include the integration of a transformer-based architecture and neural network-based 6D pose regression, enhancing robustness to occlusions. The SLAM integration optimally blends stereo visual inputs with markerless detection outputs, maximizing accuracy even in scenarios characterized by aliased and ambiguous perceptions.

Figure 2: Illustration of the markerless detection pipeline, including 2Dâ3D correspondences and pose regression network.

Experimental Validation

Evaluation on Synthetic Data

Synthetic data, strategically generated from simplified CAD models, facilitated rigorous training of both object detection and pose estimation models via BlenderProc and OAISYS frameworks. Diverse synthetic data contributed to improved model robustness, validated by detection rates and pose accuracy on unseen data samples.

Figure 3: Training samples from OAISYS and BlenderProc featuring LRU and Lander models.

Real-World Evaluation

Real-world tests demonstrated the ability of the markerless detection approach to improve detection range and accuracy in comparison with fiducial markers under VICON measurements. The test narrated scenarios where conventional markers failed beyond certain range thresholds, showing increased accuracy at distances due to reduced perspective distortion.

Figure 4: Comparison of markerless pose estimation errors against conventional AprilTag detection under VICON measurements.

Multi-Robot SLAM Integration

The markerless SLAM system underwent evaluation during field tests involving navigation experiments on Mount Etna. Results showed significant improvements in detection rates and maximum detection distances across missions, reducing open-loop navigation sequence durations and enhancing localization accuracy against D-GNSS ground truth.

Figure 5: Multi-robot SLAM results from Mission 1, illustrating trajectory and localization error corrections following markerless detection events.

Figure 6: Multi-robot SLAM results from Mission 2, showcasing error corrections after markerless observations.

Conclusion

The presented approach demonstrates substantial advancements in markerless robot detection and pose estimation for multi-agent SLAM systems. Enhanced detection ranges and improved localization accuracy suggest potential expansions into articulated robot configurations. Future directions include GPU deployment for real-time operation, emphasizing the practical applicability of this approach in dynamic and perceptually challenging environments.

Markdown Report Issue