MR6D: Benchmarking 6D Pose Estimation for Mobile Robots

Published 19 Aug 2025 in cs.CV and cs.RO | (2508.13775v1)

Abstract: Existing 6D pose estimation datasets primarily focus on small household objects typically handled by robot arm manipulators, limiting their relevance to mobile robotics. Mobile platforms often operate without manipulators, interact with larger objects, and face challenges such as long-range perception, heavy self-occlusion, and diverse camera perspectives. While recent models generalize well to unseen objects, evaluations remain confined to household-like settings that overlook these factors. We introduce MR6D, a dataset designed for 6D pose estimation for mobile robots in industrial environments. It includes 92 real-world scenes featuring 16 unique objects across static and dynamic interactions. MR6D captures the challenges specific to mobile platforms, including distant viewpoints, varied object configurations, larger object sizes, and complex occlusion/self-occlusion patterns. Initial experiments reveal that current 6D pipelines underperform in these settings, with 2D segmentation being another hurdle. MR6D establishes a foundation for developing and evaluating pose estimation methods tailored to the demands of mobile robotics. The dataset is available at https://huggingface.co/datasets/anas-gouda/mr6d.

Abstract PDF Upgrade to Chat

Summary

The paper introduces MR6D, a benchmark dataset for 6D pose estimation in mobile robotics using industrial objects like Euro pallets.
It evaluates pose estimation pipelines with different segmentation methods, revealing significant effects on Average Recall metrics.
The dataset covers diverse scenarios including static, dynamic, and odometry-based tracking to enhance mobile robot perception.

MR6D: Benchmarking 6D Pose Estimation for Mobile Robots

The paper presents MR6D, a dataset aimed at benchmarking 6D pose estimation specifically for mobile robots operating in industrial settings. Unlike traditional datasets focused on small household items manipulated by robotic arms, MR6D targets larger, industrially relevant objects, addressing unique challenges posed by mobile robotics such as long-range perception, varied object sizes, and complex occlusion/self-occlusion patterns.

Introduction to Mobile Robot Perception

The significance of 6D pose estimation lies in its ability to enable precise localization and manipulation of objects by robots. Industrial mobile robots differ from robotic arms as they require specialized gripping mechanisms and encounter diverse camera viewpoints. Mobile robots must detect larger objects at greater distances and handle self-occlusions and occlusions under extreme camera angles. This presents unique challenges for perception systems that are not addressed by existing datasets. MR6D bridges this gap by focusing on instance-level pose estimation, crucial for industrial tasks involving specific object models such as Euro pallets and standardized storage bins.

Figure 1: The objects used in our MR6D dataset. These objects are the same as those used in the MTevent dataset.

Dataset Collection Process

MR6D consists of four subsets: Validation Static, Dynamic Test, O³dyn, and Mobile Robot-Like Static, capturing both static and dynamic interactions. The Validation subset is gathered using a VICON system for accurate camera tracking, while the Dynamic subset includes human interaction, simulating collaborative robot behavior. For environments lacking external tracking, the O³dyn subset estimates camera trajectories based on odometry, later refined using VGGT model-based 3D reconstruction. Each subset presents distinct challenges, illustrating diverse scenarios encountered by mobile robots.

Figure 2: Coordinate transformations used for accurate annotation in validation and dynamic scenes.

Figure 3: Annotating object 6D pose in the validation subset.

Data Statistics and Object Modeling

The dataset objects include common industrial items like Euro pallets and KLT bins, rigorously modeled using high-quality 3D meshes for accurate representation. Object selection criteria focus on non-standard gripper interactions and widespread availability. This allows the dataset to serve as a reference for synthesizing training data for instance-based pose estimation models. Figure 4 provides a comprehensive view of object dimensions, pose angle distributions, occurrences, and visibility statistics within MR6D.

Figure 4: Data statistics including dimensions, pose angles, occurrences, visibility, and distances.

Evaluation of Pose Estimation Pipelines

MR6D evaluates pose estimation pipelines using both ground-truth masks and unseen segmentation pipelines, leveraging FoundationPose and CTL methods. Results indicate that while 6D pose models show generalization to novel objects, segmentation quality is a significant constraint. GT-masks+FoundationPose yields an Average Recall of 0.3462 across test subsets, whereas CTL-generated masks lower AR to 0.1841. Figure 5 showcases pipeline predictions against ground truth annotations, emphasizing segmentation as a crucial factor.

Figure 5: Quantitative results for GT-masks + FoundationPose.

Figure 6: Challenging cases in evaluation with GT-masks + FoundationPose.

Conclusion

MR6D establishes a new benchmark for 6D pose estimation tailored for mobile robots in industrial environments. Its instance-level focus on larger objects, combined with diverse environmental settings, addresses crucial perception challenges. The evaluation underscores the importance of improving 2D segmentation and potential future enhancements in BOP metrics to account for detection distances and urgency. Future work may explore entity-level segmentation and depth estimation improvements for enhanced object integrity in 6D pose estimation pipelines.

Markdown Report Issue