Motion Prior SLAM: Dynamic Environments

Updated 19 February 2026

Motion Prior SLAM is a collection of methodologies that integrate physically informed motion models into SLAM to enable robust mapping in dynamic and heterogeneous environments.
It leverages diverse models—from constant velocity and spline-based representations to IMM and Ackermann constraints—to accurately track dynamic objects and multi-agent interactions.
These techniques improve pose accuracy and reduce trajectory errors, with applications spanning robotics, autonomous driving, and AR/VR.

Motion Prior SLAM refers to a suite of methodologies for Simultaneous Localization and Mapping (SLAM) that explicitly incorporates physically informed priors on the motion of agents or objects. These motion priors range from simple constant-velocity assumptions to complex, learned, or environment-specific models (e.g., Ackermann steering, multi-agent interactions). The integration of motion priors fundamentally extends classic SLAM from the static-world assumption to dynamic, heterogeneous, and interactive environments. Techniques under this umbrella address dynamic scenes, multi-object tracking, multi-agent traffic, and physically grounded kinematic constraints, targeting applications in robotics, autonomous driving, and AR/VR.

1. Mathematical Foundations of Motion Priors in SLAM

Motion priors are formalized as additional constraints or regularization terms on the time evolution of poses and landmark/object states within SLAM’s optimization or filtering backend. The mathematical expression depends on the scenario:

Rigid-Body Constant Motion: For dynamic objects, the pose evolution is modeled as a constant body-fixed increment in $SE(3)$ :

${}_{k}^{k}H_{k+1} = C, \quad \forall k,$

or in the world frame:

${}^{0}_{k}H_{k+1} = H = \exp(u), \quad u \in \mathfrak{se}(3).$

This results in ternary motion factors connecting object pose/landmark states at $k$ and $k+1$ to a constant twist $u$ (Henein et al., 2018).

Spline-Based Motion Models: DynaGSLAM parameterizes the center of each dynamic Gaussian in 3DGS representation via a cubic Hermite spline, whose control points are means and velocities inferred from 3D optical flow:

$m^d(\tau) = (2s^3-3s^2+1)m^d(t-1) + (s^3 - 2s^2 + s)v^d(t-1) + (-2s^3 + 3s^2)m^d(t) + (s^3 - s^2)v^d(t)$

with $s = \frac{\tau - (t-1)}{1}$ . The motion prior $E_p$ analytically enforces spline consistency (Li et al., 15 Mar 2025).

Interacting Multiple Model (IMM): In Visual SLAMMOT, object state is simultaneously filtered through a bank of motion models (circular, constant velocity, constant turn-rate and velocity), with model weights $w_{i,t}^d$ and dynamics

$x_{t+1}^d = g^d(x_t^d) + w_t^d, \quad w_t^d \sim \mathcal N(0, Q^d),$

forming a weighted sum of residuals in the global SLAM bundle adjustment (Tian et al., 2024).

Vehicle Kinematic Priors: In OpenGV 2.0, non-holonomic Ackermann constraints are imposed via infinitesimal or discrete-time penalties enforcing vehicle velocity alignment with the forward axis:

$\| R(t) [0;1;0] - \eta(v(t)) \|^2$

at each timestamp (Huang et al., 5 Mar 2025).

Multi-Agent Game-Theoretic Priors: In GTP-SLAM, agent motion and interaction dynamics are jointly encoded in a potential function, ensuring that the solution is a Nash equilibrium in a dynamic, multi-agent system (Chiu et al., 2022).

2. System Architectures and Factor Graph Integration

Motion Prior SLAM systems are typically implemented as factor graph frameworks, where motion priors appear as additional graph nodes and ternary/quaternary factors:

Classical Factorization: Robot poses, static landmarks, and dynamic object landmarks constitute variable nodes. Motion priors are realized as factors—e.g., $r^{mot}_{k,i,j}(l_k^i, l_{k+1}^i, u^j)$ —linking sequential states and the group twist parameter (Henein et al., 2018).
Gaussian Splatting & Dynamic Gaussians: DynaGSLAM maintains parallel lists of static and dynamic Gaussians, with dynamic centers evolving by spline motion priors and scheduled for deletion if unobserved (Li et al., 15 Mar 2025).
Back-end Optimization: In Visual SLAMMOT (IMM), the full state comprises camera poses $T_{1:T}$ , static point set $\{m_j\}$ , and object states for all models $\{s^d_{i,t}\}$ . The negative log-posterior minimized includes odometry, reprojection, and mode-weighted motion prior residuals (Tian et al., 2024).
Multi-object and Multi-agent Systems: VDO-SLAM and GTP-SLAM treat each moving object/agent as a first-class entity in the factor graph, with per-object SE(3) motion trajectories and, in the latter, mutual game-theoretic interaction terms (Zhang et al., 2020, Chiu et al., 2022).

The joint optimization is commonly solved via Gauss–Newton or Levenberg–Marquardt, with implementation-specific strategies for sparsity and variable ordering.

3. Representative Motion Priors and Detection Methodologies

The choice of motion prior is adapted to data modalities, scene content, and domain-specific constraints:

Domain	Motion Prior Type	Example Method
Dynamic Rigid Objects	Constant velocity in $SE(3)$	Motion-Prior SLAM (factor graph) (Henein et al., 2018)
3DGS Representation	Cubic Hermite spline (positions/velocities)	DynaGSLAM (Li et al., 15 Mar 2025)
Multi-agent traffic	Potential-game interaction	GTP-SLAM (Chiu et al., 2022)
Visual-Inertial SLAM	IMU-based per-landmark motion	IDY-VINS (Sun et al., 30 Mar 2025)
Surround-view Vehicle	Ackermann (non-holonomic)	OpenGV 2.0 (Huang et al., 5 Mar 2025)
MOT Integration	Interacting Multiple Model	Visual SLAMMOT (Tian et al., 2024)

Motion detection and dynamic region association often rely on combinations of optical flow (e.g., RAFT or PWC-Net), semantic/instance segmentation (e.g., Mask R-CNN, SAM2), and geometric/epipolar consistency. For dynamic object management, observations are clustered, tracked, and matched with priors driving either association or downweighting/removal in optimization (IDY-VINS employs a $\chi_1^2$ test on per-landmark projection error, whereas DynaGSLAM uses blast radius and nearest-neighbor criteria).

4. Integration with Perceptual Front-Ends and Multi-Object Tracking

Motion priors are typically fused with standard SLAM front-ends (stereo, RGB-D, inertial, visual-inertial):

Visual-Inertial Systems: IDY-VINS preprocesses each tracked feature using IMU-preintegrated motion priors to compute minimum projection errors, filtering dynamic outliers before bundle adjustment (Sun et al., 30 Mar 2025).
Feature/Direct Hybrid Systems: The two-layer system in (Krombach et al., 2018) uses fast feature-based stereo VO as a rigid-body motion prior to initialize semi-dense direct alignment, enabling robust tracking through large inter-frame motions by fusing both in the cost function via a soft pose-prior penalty.
Joint SLAM+MOT: Visual SLAMMOT adopts a three-stage pipeline: (1) standard SLAM (e.g., ORB-SLAM2), (2) deep-learned MOT frontend for instance segmentation and association, and (3) IMM filtering/bundle-adjustment, whose graph includes both visual and motion-prior residuals for each candidate model (Tian et al., 2024).
Gaussian Splatting: DynaGSLAM uniquely employs a time-varying Gaussian-splatting scene representation, supporting real-time rendering, tracking, and prediction in dynamic scenes by integrating motion priors directly into the representation (Li et al., 15 Mar 2025).

5. Performance, Empirical Results, and Benchmarking

Empirical studies across various domains demonstrate that Motion Prior SLAM systems yield substantive improvements over static-world or naive dynamic-scene baselines:

Static vs. Dynamic Handling (DynaGSLAM): On OMD, TUM-walking, and Bonn Balloon, DynaGSLAM achieves PSNR and SSIM gains, with DynaPSNR up to +10 dB over anti-dynamic GS-SLAM. Dynamic-region rendering remains plausible in interpolation/extrapolation, in contrast to "smeared, ghosted artifacts" with static GS-SLAM (Li et al., 15 Mar 2025).
Structure and Trajectory Error Reduction: In Motion-Prior SLAM, incorporation of constant-motion factors reduces structure errors (ASE, RSE) and pose errors (ATE, RTE, RRE) by 50–70% relative to static-only baselines in both simulated and real robot datasets (Henein et al., 2018).
Visual-Inertial Robustness: IDY-VINS cuts ATE by up to 47% versus VINS-Fusion across all dynamic levels in the VIODE and EUROC datasets, while eliminating map "ghosts" (Sun et al., 30 Mar 2025).
MOT and SLAM Consistency: On KITTI, Visual SLAMMOT's Level 3 (IMM) yields improvements in APE (2.60 m vs. 2.70 m for decoupled systems) and MOTP (2.56 m vs. 2.70 m for decoupled systems) (Tian et al., 2024).
Non-holonomic (Ackermann) Vehicles: OpenGV 2.0's FSBA achieves translation errors of 8 mm and sub-0.1° rotation error on KITTI-VO, outperforming classical bundle adjustment and achieving low drift with large-scale urban data sets (Huang et al., 5 Mar 2025).
Multi-Agent Nash Equilibria: GTP-SLAM achieves 20–50% lower RMSE than conventional bundle-adjustment in a multi-agent highway scenario, remaining robust even at high observation noise (Chiu et al., 2022).

6. Applications, Insights, and Limitations

Motion prior-based SLAM enables robust operation in scenarios with significant dynamics, non-rigid agents, or physical/traffic constraints. Applications include:

Real-time photorealistic 3D mapping with dynamic objects (DynaGSLAM) (Li et al., 15 Mar 2025).
Object-centric mapping and motion prediction for navigation/planning (VDO-SLAM) (Zhang et al., 2020).
Visual-inertial localization in urban driving with heavy scene dynamics (IDY-VINS) (Sun et al., 30 Mar 2025).
Large-scale vehicle SLAM with minimal field-of-view overlap and strong motion non-holonomicity (OpenGV 2.0) (Huang et al., 5 Mar 2025).
Unified, feedback-aware mapping and tracking in multi-agent environments (GTP-SLAM) (Chiu et al., 2022).
Robust intersection of SLAM and MOT (Visual SLAMMOT), with per-object model selection (Tian et al., 2024).

A key insight is that naive coupling with oversimplified motion models can degrade performance under high measurement noise, whereas mode-adaptive schemes (e.g., IMM) restore robustness (Tian et al., 2024). Additionally, physical motion constraints remove unobservable modes present in generic bundle adjustment, leading to improved scale recovery and drift suppression (Huang et al., 5 Mar 2025).

Limitations may include the inability to estimate online uncertainty of the final joint state (Visual SLAMMOT), dependence on the quality of segmentation/association, and possible misfit under motion model violations or unmodeled dynamics.

7. Future Directions and Open Challenges

Research fronts in Motion Prior SLAM focus on:

Incorporation of richer or learned motion priors (e.g., from non-rigid dynamics or neural predictors).
Incremental, scalable solvers (e.g., iSAM2, GPU-accelerated optimization) tailored for dense, multimodal graphs arising from hybrid SLAM+MOT systems.
Full uncertainty quantification in joint state estimation for downstream planning.
Joint exploitation of semantic cues and physical priors for robust real-time mapping in unconstrained environments.

As datasets, sensors, and autonomous applications grow in complexity, motion prior SLAM frameworks are expected to further integrate kinematic, dynamic, and agent-interaction models to ensure perception robustness and accuracy in real-world dynamic scenes.