Papers
Topics
Authors
Recent
Search
2000 character limit reached

PPF-Tracker: Articulated SE(3) Pose Tracking

Updated 15 November 2025
  • PPF-Tracker is a category-level articulated object pose tracking framework operating in SE(3), utilizing dynamic keyframes and point-pair features.
  • It integrates quasi-canonicalization, SE(3)-invariant learning, and tangent space voting to achieve robust tracking under complex kinematic conditions.
  • Its design supports real-time applications in robotics and augmented reality through efficient drift management and Gauss–Newton kinematic refinement.

PPF-Tracker is a category-level articulated object pose tracking framework operating in the SE(3) Lie group space, specifically designed to address the challenging problem of multi-part object pose tracking under complex, real-world kinematic conditions. Leveraging quasi-canonicalization and point-pair feature representations, PPF-Tracker integrates SE(3)-invariant learning, pose voting on tangent spaces, and explicit part-joint kinematic constraints. Its full pipeline delivers robust tracking for articulated structures in robotics, augmented reality, and embodied intelligence scenarios.

1. Quasi-Canonicalization on SE(3) Manifolds

PPF-Tracker defines a systematic quasi-canonicalization procedure for articulated objects comprising KK rigid parts. At each frame tt, part-wise point clouds Ptk\mathcal P_t^k and predicted poses TtkSE(3)T_t^k\in\mathrm{SE}(3) are processed in reference to dynamic keyframes. Frames are partitioned into segments indexed by ii, where each segment runs between successive keyframes.

A keyframe inverse is constructed for each part: Kik=(Tnk)1\mathcal K_i^k = (T_{n}^k)^{-1}, with nn marking the segment start. Canonicalization transforms incoming clouds within segment ii via

Pˉtk=Kik(Ptk)=(Tnk)1Ttk(Ptk),t[n,[i+1])\bar{\mathcal P}_t^k = \mathcal K_i^k(\mathcal P_t^k) = (T_{n}^k)^{-1}T_t^k(\mathcal P_t^k),\quad t\in[n,[i+1])

where in practice, PˉtkΔTtkPck\bar{\mathcal P}_t^k\approx \Delta T_t^k\cdot \mathcal P_c^k using previous estimates. Relative pose is expressed as

tt0

with absolute pose accumulation: tt1

Dynamic Keyframe Selection (DKS) centralizes drift management: after each prediction, energy is computed as

tt2

where tt3 and tt4 are Chamfer and Hausdorff distances. A new keyframe is triggered if tt5, with typical threshold tt6. This mechanism regulates frame reference updates to minimize drift and enhance motion adaptation.

2. Point-Pair Feature Representation for Articulated Objects

PPF-Tracker utilizes rigidity-invariant point-pair features. For points tt7 with normals tt8, the directional vector is

tt9

and the canonical 4-D PPF encoding is

Ptk\mathcal P_t^k0

which is invariant under any rigid transformation Ptk\mathcal P_t^k1.

A learned pair-wise weighting, based on normal angle Ptk\mathcal P_t^k2, is introduced: Ptk\mathcal P_t^k3 Biasing against nearly-parallel pairs enhances voting contrast in subsequent network heads. A set of Ptk\mathcal P_t^k4 point pairs, each with its weighted PPF and optionally their joint coordinates Ptk\mathcal P_t^k5, is propagated through a PointNet++ backbone capturing relevant geometric relationships.

3. SE(3)-Tangent Pose Voting with Explicit Parameterization

Following feature extraction, the network splits into five prediction heads per part Ptk\mathcal P_t^k6:

  • Translation votes: Ptk\mathcal P_t^k7
  • Orientation votes: Ptk\mathcal P_t^k8
  • Scale regressor: Ptk\mathcal P_t^k9

Let TtkSE(3)T_t^k\in\mathrm{SE}(3)0 denote the canonical part center, TtkSE(3)T_t^k\in\mathrm{SE}(3)1 the axes, and TtkSE(3)T_t^k\in\mathrm{SE}(3)2 as above. The translation parameters

TtkSE(3)T_t^k\in\mathrm{SE}(3)3

describe circles of possible part centers. The orientation parameters

TtkSE(3)T_t^k\in\mathrm{SE}(3)4

vote for canonical rotation.

Each PPF casts soft votes, via a small MLP, into discretized translation (TtkSE(3)T_t^k\in\mathrm{SE}(3)5 bins) and orientation (TtkSE(3)T_t^k\in\mathrm{SE}(3)6 Fibonacci sphere bins) histograms. Maxima are extracted for continuous estimates TtkSE(3)T_t^k\in\mathrm{SE}(3)7, and scale TtkSE(3)T_t^k\in\mathrm{SE}(3)8 is regressed through MSE loss.

From TtkSE(3)T_t^k\in\mathrm{SE}(3)9, an element ii0 is constructed: ii1 where analytical mappings follow Eade (2013). Pose updates are performed in tangent space: ii2 with exponential mapping ensuring rotation matrix orthogonality.

4. Kinematic Constraints and Joint-Axis Optimization

The framework incorporates kinematic-constraint refinement for articulated joints. For ii3 joints interconnecting ii4 parts, revolute joints rotate about axis ii5, prismatic joints slide along it. Joint ii6 is characterized by reference point ii7 and direction ii8.

Two energy terms define the optimization:

  1. Geometric alignment per part:

ii9

  1. Kinematic coupling per joint:

Kik=(Tnk)1\mathcal K_i^k = (T_{n}^k)^{-1}0

with axis and translation constraints depending on joint type.

The total objective,

Kik=(Tnk)1\mathcal K_i^k = (T_{n}^k)^{-1}1

is minimized, typically via Gauss–Newton, to yield refined pose estimates Kik=(Tnk)1\mathcal K_i^k = (T_{n}^k)^{-1}2. This step enforces consistency of joint articulation across parts and frames.

5. Pipeline Overview and Implementation Pseudocode

The PPF-Tracker process operates as a stream on input clouds and initial poses. The following pseudocode details the core steps:

nn1 This single-stream design supports online operation and naturally accommodates dynamic keyframe selection and kinematic refinement.

6. Network Architecture, Loss Functions, and Training Protocols

PPF-Tracker deploys a PointNet++ backbone for feature learning over weighted point-pair features. Four heads operate in parallel:

  • Translation: Predicts softmax histograms for Kik=(Tnk)1\mathcal K_i^k = (T_{n}^k)^{-1}3 with Kik=(Tnk)1\mathcal K_i^k = (T_{n}^k)^{-1}4 bins
  • Orientation: Predicts softmax histogram over Kik=(Tnk)1\mathcal K_i^k = (T_{n}^k)^{-1}5 bins for Kik=(Tnk)1\mathcal K_i^k = (T_{n}^k)^{-1}6
  • Scale: Regression for Kik=(Tnk)1\mathcal K_i^k = (T_{n}^k)^{-1}7
  • Mask: Optional part segmentation via binary prediction

Loss functions are constructed as follows:

  • Translation and orientation: KL-divergence on softmax voting outputs
  • Scale: Mean squared error (MSE)
  • Mask: Binary cross-entropy (BCE)

The final loss combines all components: Kik=(Tnk)1\mathcal K_i^k = (T_{n}^k)^{-1}8 Training is conducted for 200 epochs using Adam optimizer with initial learning rate Kik=(Tnk)1\mathcal K_i^k = (T_{n}^k)^{-1}9, decayed by 0.1 every 10 epochs, and input clouds downsampled to 3072 points. Inference is performed per frame with runtime nn0 on RTX 4090-class hardware, demonstrating suitability for real-time robotic or AR scenarios.

A plausible implication is that PPF-Tracker's dynamic keyframe mechanism can adapt to unpredictable motion patterns and maintain low drift even in long sequences.

7. Applications and Implementation Considerations

PPF-Tracker is applicable to pose tracking in multi-part robotic manipulators, articulated AR objects, and category-level scene understanding, wherever rigid part motion is constrained by physically plausible kinematic joints. The framework supports extension to broader categories given annotation of joint axes.

Resource requirements are compatible with real-time deployment given modern GPUs, and the modular pipeline with explicit keyframing and refinement facilitates integration with higher-level control, mapping, or semantic segmentation subsystems.

Its empirical generalization across synthetic and real-world scenarios suggests strong domain robustness. For full implementation details, all codes and pretrained models are available at https://github.com/mengxh20/PPFTracker. Lie group background follows Eade (2013).

Below is a concise summary of design choices:

Component Key Method Implementation
Feature Backbone PointNet++ (v_{ij},PPF_{ij})
Voting Softmax + MLP heads Histograms, MSE
Kinematic Refinement Gauss–Newton \mathcal E_{\rm comp}
Keyframe Policy Dynamic, energy-based Chamfer, Hausdorff

This synthesis represents the current canonical implementation and research status of PPF-Tracker for articulated pose tracking in SE(3).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to PPF-Tracker.