Exploring Category-level Articulated Object Pose Tracking on SE(3) Manifolds

Published 8 Nov 2025 in cs.CV, cs.AI, and cs.MM | (2511.05996v1)

Abstract: Articulated objects are prevalent in daily life and robotic manipulation tasks. However, compared to rigid objects, pose tracking for articulated objects remains an underexplored problem due to their inherent kinematic constraints. To address these challenges, this work proposes a novel point-pair-based pose tracking framework, termed \textbf{PPF-Tracker}. The proposed framework first performs quasi-canonicalization of point clouds in the SE(3) Lie group space, and then models articulated objects using Point Pair Features (PPF) to predict pose voting parameters by leveraging the invariance properties of SE(3). Finally, semantic information of joint axes is incorporated to impose unified kinematic constraints across all parts of the articulated object. PPF-Tracker is systematically evaluated on both synthetic datasets and real-world scenarios, demonstrating strong generalization across diverse and challenging environments. Experimental results highlight the effectiveness and robustness of PPF-Tracker in multi-frame pose tracking of articulated objects. We believe this work can foster advances in robotics, embodied intelligence, and augmented reality. Codes are available at https://github.com/mengxh20/PPFTracker.

Abstract PDF Upgrade to Chat

Summary

The paper introduces PPF-Tracker, a framework that estimates incremental SE(3) poses to ensure geometric consistency in articulated objects.
It leverages dynamic keyframe selection and weighted point pair features to reduce rotation and translation errors while achieving real-time performance.
By mapping transformations through Lie algebra and enforcing kinematic constraints, the method demonstrates improved stability and generalization across diverse datasets.

Category-Level Articulated Object Pose Tracking on SE(3) Manifolds: An Expert Analysis

Motivation and Problem Formulation

Reliable pose tracking of articulated objects is a persistent challenge in robotics and embodied AI. Unlike rigid objects, articulated categories such as cabinets, laptops, or robot arms are characterized by dynamic multi-part structures governed by kinematic constraints. Category-level pose tracking further demands generalization to unseen instances within a class, eschewing reliance on CAD priors or instance-specific cues. The paper introduces a principled formulation of multi-part pose tracking as increment estimation over the SE(3) manifold, which avoids singularities and ensures geometric consistency.

Existing techniques often falter on two pivotal axes: (1) Geometric inconsistency stemming from optimization in Euclidean space (e.g., Euler angles, quaternions), leading to invalid rotations or unstable pose solutions, and (2) Inefficient tracking methodologies that ignore temporal continuity and inter-part structural constraints.

Figure 1: The Categorization of Tracking Methods: egocentric, instance-level, and category-level approaches.

The PPF-Tracker Framework

The proposed PPF-Tracker algorithm models articulated objects as a set of rigid parts, with associated joints, and operates on temporally incremented point cloud data. The pipeline is structured around three main contributions:

Quasi-Canonicalization via Dynamic Keyframes: Temporal segments bounded by adaptively chosen keyframes minimize cumulative drift and enable robust pose increments. Dynamic selection leverages a geometric alignment energy function combining Chamfer and Hausdorff distances, ensuring keyframe reliability and responsiveness to scene dynamics.
Figure 2: Illustration of Temporal Segment and Dynamic Keyframe Selection in the frame stream.
Point Pair Feature-Based SE(3) Invariance and Voting: Pose increments are inferred via weighted Point Pair Features (PPF), encoding local geometry between arbitrarily sampled point pairs. Orientation and translation voting accumulate evidence on SE(3)-invariant parameters, bypassing direct regression of transformation matrices. Weighted pairs (where perpendicular normals receive higher weights) deliver improved discriminativity compared to vanilla approaches.
Figure 3: The Overview of the PPF-Tracker architecture and core components.

Figure 4: Traditional (a) versus Weighted (b) Point Pair selection, enhancing geometric sensitivity.
Lie Algebra Transformation & Kinematic-Constrained Optimization: By mapping SE(3) increments to $\mathfrak{se}(3)$ (Lie algebra), the algorithm guarantees matrix orthogonality and continuous, singularity-free pose prediction. Subsequent optimization of pose estimates is governed by a comprehensive energy function incorporating both geometric alignment and articulated kinematic constraints, enforcing physical plausibility across connected parts.
Figure 5: Illustration of Voting Scheme for orientation (circular and spherical bins) and translation parameter aggregation.

Experimental Evaluation and Comparative Performance

PPF-Tracker is rigorously validated on synthetic (PM-Videos), semi-synthetic (ReArt-Videos), and real-world datasets (RobotArm-Videos). Strong numerical results are reported:

Eyeglasses Category: Mean rotation error reduced to $3.3^{\circ}$ , translation error to $0.036$m, and 3D IOU increases by 17.6% relative to the next-best baseline.
Dishwasher Category: Rotation error down to $3.2^{\circ}$ , translation error $0.038$m, 3D IOU of 87.2%.
Real-Time Inference: PPF-Tracker achieves 0.07-0.16s/frame inference times, superseding existing SOTA methods.

Ablation studies highlight the critical roles of kinematic constraints (over 50% error reduction) and dynamic keyframe selection (further halving error relative to fixed strategies).

Figure 6: Qualitative Results on PM-Videos demonstrating multi-part pose fidelity across frames.

Figure 7: Qualitative Results on ReArt-Videos (Top) and RobotArm-Videos (Bottom) highlighting generalization in semi-synthetic and real-world scenarios.

Theoretical and Practical Implications

The SE(3)-centric increment tracking paradigm eliminates geometric inconsistency and drift prevalent in Euclidean-based solutions, especially under challenging articulated motion. Weighted PPF voting enables category-level generalization by exploiting local geometric invariances, rather than instance memorization. Lie algebra mapping ensures stable, singularity-immune updates.

Practically, the method unlocks application potential in robotics, AR/VR, and embodied AI, where robust, real-time pose tracking of unseen articulated objects is critical. The category-level approach removes the dependency on CAD models, supporting deployment in unstructured, dynamic environments.

Theoretically, the kinematic-constrained optimization strategy bridges classic rigid-body pose estimation and physically consistent articulation modeling, providing a template extensible to more complex multi-joint, non-rigid setups.

Speculation on Future Directions

Future research may build on this work by integrating self-supervised articulation learning and transferring SE(3)-invariant representations across domain gaps. Incorporating interaction priors, affordance-driven tracking, and adaptive part decomposition would further enhance manipulation and perception tasks. Extending the method to the full spectrum of deformable and soft-body objects will require generalizing the kinematic constraint mechanisms beyond rigid chains.

In robotics and AR/VR, the PPF-Tracker's time-efficient, generalizable framework is poised to facilitate closed-loop interaction, object-hand synergy, and persistent environment understanding at scale.

Conclusion

PPF-Tracker advances the state-of-the-art in category-level articulated object pose tracking by leveraging SE(3)-invariant increment learning, weighted PPF voting, and rigorous kinematic optimization. The algorithm demonstrates strong accuracy, robustness, and real-time efficiency across synthetic and real-world articulated object datasets. Its principled formulation and empirical validation mark a substantial step toward practical, general-purpose articulated pose tracking systems for robotics, AR/VR, and embodied intelligence applications (2511.05996).

Markdown Report Issue