Papers
Topics
Authors
Recent
Search
2000 character limit reached

Keyframe-Based Sliding Window Filter

Updated 11 February 2026
  • Keyframe-Based Sliding Window Filter is a state-of-the-art approach integrating tightly-coupled IMU and camera data within a fixed-lag keyframe window to achieve accurate visual-inertial odometry.
  • It employs a rigorous process model and first-estimate Jacobians to ensure estimator consistency and robustness even during standstill and challenging motion regimes.
  • The method enables real-time performance with online self-calibration of all critical sensor parameters, balancing computational efficiency with state observability.

The Keyframe-Based Sliding Window Filter (KSWF) is a principled approach for visual-inertial odometry (VIO) and full sensor self-calibration, operating by combining tightly-coupled IMU and camera measurements within a fixed-lag window of “keyframes.” The approach ensures estimator consistency, maintains real-time performance, and is robust to standstills and challenging motion regimes. KSWF has been established as a reference architecture for achieving both high-accuracy motion estimation and online calibration of all critical sensor parameters—including camera intrinsics, extrinsics, temporal offsets, and IMU systematics—in modern VIO pipelines (Huai et al., 2020, Huai et al., 2022).

1. State Representation and Structure

At time tt, the KSWF maintains an augmented state vector that includes:

  • The current navigation state π(t)={pWB(t),RWB(t),vWB(t)}\pi(t) = \{p_{WB}(t), R_{WB}(t), v_{WB}(t)\}, where pWB(t)R3p_{WB}(t) \in \mathbb{R}^3 is position, RWB(t)SO(3)R_{WB}(t) \in SO(3) is orientation, and vWB(t)R3v_{WB}(t) \in \mathbb{R}^3 is velocity of the IMU frame BB relative to the world WW.
  • Static IMU parameter block xSx_S (or ximux_{imu}), comprising gyroscope/accelerometer biases, scale, misalignments, and g-sensitivity matrices (e.g., bgb_g, bab_a, MgM_g, TsT_s, MaM_a).
  • Static camera parameter block xCx_C (or xcx_c), for each camera kk: extrinsics TBCkT_{BC_k}, 8-parameter intrinsics and distortion, time offset tdkt_d^k, and rolling-shutter readout time trkt_r^k.
  • A window xWx_W (or xwx_w) of NkfN_{kf} keyframe and NtfN_{tf} recent temporal clones of the full navigation state (π(tj))(\pi(t_j)).
  • If using landmark-based updates: a bank xLx_L of currently-triangulated 3D features, usually parameterized by inverse depth and anchor keyframe/camera.

This full-state formulation enables joint propagation and update of robot pose, velocity, sensor calibration, and (optionally) a set of landmarks, within a sliding window.

2. IMU Propagation and Process Model

KSWF propagates the navigation state using a continuous-time IMU-driven process model. The IMU measurement model, governing accelerometer and gyroscope data, includes both additive noise and systematic effects via scale/misalignment matrices and g-sensitivity. The process equations (in the body frame) are: at=Ma(amba), ωt=Mg(ωmbg)MgTs(amba)a_t = M_a ( a_m - b_a ), \ \omega_t = M_g ( \omega_m - b_g ) - M_g T_s ( a_m - b_a )

p˙WB=vWB,v˙WB=RWBat+gW,R˙WB=RWB[ωt]×\dot{p}_{WB} = v_{WB},\qquad \dot{v}_{WB} = R_{WB} a_t + g_W,\qquad \dot{R}_{WB} = R_{WB} [\omega_t]_\times

where gWg_W is gravity in the world frame. The state and its covariance are discretized using mid-point or trapezoidal numerical integration, with error-state propagation following the standard extended Kalman filter (EKF) or Lie group error-state transitions.

Consistency is enforced by using First-Estimate Jacobians (FEJ) for all linearizations involving poses and velocities, maintaining the correct unobservable subspace corresponding to global yaw and position, and preventing linearization artifacts.

3. Visual Measurement and Update Models

The visual frontend detects, tracks, and matches features across temporally- and spatially-separated keyframes. For rolling-shutter cameras, the time at which each feature at row vv of the image is observed is ti,jk=tjk+(v/h1/2)trkt_{i,j}^k = t_j^k + (v/h - 1/2)t_r^k, ensuring sub-row temporal accuracy in reprojection.

Feature measurement incorporates an 8-parameter camera model (pinhole projection plus radial/tangential distortion). Each landmark observation yields a residual: ri,jk=zi,jkhk(TBCk1TWB(ti,jk)1pi)r_{i,j}^k = z_{i,j}^k - h^k\big(T_{BC_k}^{-1} T_{WB(t_{i,j}^k)}^{-1} p_i\big) The update is performed either in a structureless (landmark-marginalized) or landmark-in-state (“structural”) manner:

  • Structureless update: Landmark states are analytically marginalized via nullspace projection, and only the filter state δx\delta x is updated (Huai et al., 2020).
  • Landmark-in-state: Landmarks are included in the filter state, enabling translation/rotation observability during standstill (Huai et al., 2022).

In both cases, feature tracks are triangulated when their last observation leaves the window, and all Jacobians with respect to navigation, calibration, and landmark parameters are computed to enable joint updates.

4. Keyframe and Sliding Window Management

Keyframe selection and window maintenance are essential to retain computational tractability while maximizing trajectory coverage and geometric leverage:

  • A new frame becomes a keyframe if its overlap oko_k with existing features is below a threshold ToT_o or the ratio rkr_k of tracked landmarks to current features falls below TrT_r (typical: To=60%T_o=60\%, Tr=20%T_r=20\%).
  • The backend window holds NkfN_{kf} keyframes plus NtfN_{tf} most recent frames. When the window exceeds size, redundant (typically oldest) frames are identified and removed.
  • Marginalization is performed via the Schur complement on the information matrix. Specifically, for a partitioned information system, marginalized priors are constructed through:

Λm=Λ11Λ12Λ221Λ21 bm=b1Λ12Λ221b2\Lambda_m = \Lambda_{11} - \Lambda_{12} \Lambda_{22}^{-1} \Lambda_{21} \ b_m = b_1 - \Lambda_{12} \Lambda_{22}^{-1} b_2

These are injected back into the active state as new priors.

This keyframe-centric windowing preserves global trajectory and calibration information while maintaining bounded computational complexity.

5. Online Self-Calibration and Observability

A distinctive property of KSWF is the explicit inclusion of all IMU and camera intrinsics/extrinsics, temporal offsets, and rolling-shutter readout times as state variables. Upon each vision update, the Jacobian of feature reprojection with respect to every calibration parameter is computed (with some efficient simplifications). Over time, as diverse motion excites various system directions, the cross-covariance between navigation states and calibration parameters allows the EKF to infer and converge all parameters to their true values, contingent on sufficient excitation and motion richness.

A rigorous observability analysis via Lie derivatives shows that, under general non-degenerate motion, all IMU and camera intrinsics, camera-IMU time offset tdt_d, and rolling-shutter readout trt_r are weakly observable, with the only fundamental unobservables being global yaw, position, and a scale ambiguity if g\|g\| is unknown (Huai et al., 2022). Certain parameters (e.g., time delay or scaling terms) become weakly unobservable only in static or single-axis motion regimes; KSWF allows “locking” of such parameters by setting their prior covariance to zero if required.

6. Robustness to Standstill and Drift Mitigation

Standard structureless sliding-window filters (e.g., MSCKF, SL-SWF) suffer from loss of observability and drift during standstill or low-motion periods because all updates derive from relative observations over a short window. KSWF counters this through two core mechanisms:

  • Landmarks retained in the state vector (as in ROVIO or KSWF with landmarks) encode geometric constraints from older keyframes, preserving memory of past structure and preventing drift.
  • Keyframe-based matching ensures that even at standstill, large-baseline correspondences extend across the window, enabling reliable translation and rotation observability (Huai et al., 2022).

Empirical results demonstrate that KSWF and related sliding-window filters maintain correct velocity and locking during standstill, whereas conventional structureless methods exhibit drift.

7. Computational Complexity and Performance

KSWF is engineered for real-time operation. On a typical 6-core Intel i7 platform, processing EuRoC stereo streams with 752×480752\times480 images and $400$ features per frame, KSWF achieves 42 Hz (monocular: 125 Hz). The main computational costs are the EKF update (approx. 13 ms per frame) and marginalization (approx. 5 ms per frame), with feature matching consuming an additional 10 ms (Huai et al., 2020).

Extensive evaluation on benchmarks including EuRoC and TUM VI shows that KSWF with full self-calibration achieves root-mean-square errors and calibration accuracy matching or exceeding established methods (e.g., OKVIS, VINS-Mono, OpenVINS), and recovers all time-invariant parameters to reference values under sufficient motion excitation (Huai et al., 2022).

Summary Table: Key KSWF State Components

Component Description Included Parameters
Navigation (π\pi) Current pose and velocity (IMU/body in world) pWB(t)p_{WB}(t), RWB(t)R_{WB}(t), vWB(t)v_{WB}(t)
IMU params (xSx_S) IMU systematic error models bgb_g, bab_a, MgM_g, TsT_s, MaM_a
Camera params (xCx_C) Camera intrinsic/extrinsic, time/rolling offsets TBCkT_{BC_k}, intrinsics, tdkt_d^k, trkt_r^k
Window (xWx_W) Sliding window of navigation “clones” [π(tj)][\pi(t_j)] for j=0,...,Nkf+Ntf1j=0,...,N_{kf}+N_{tf}-1
Landmarks (xLx_L) Optional, for landmark-based update variants 3D points (e.g., inverse depth anchoring)

KSWF’s architecture allows for flexible switching between structureless and landmark-in-state updates, provides concrete guarantees on self-calibration, and maintains estimator consistency and accuracy in a broad array of VIO conditions (Huai et al., 2020, Huai et al., 2022).

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Keyframe-Based Sliding Window Filter (KSWF).