A Minimal Solver for Relative Pose Estimation with Unknown Focal Length from Two Affine Correspondences

Published 28 Dec 2025 in cs.CV | (2512.22833v1)

Abstract: In this paper, we aim to estimate the relative pose and focal length between two views with known intrinsic parameters except for an unknown focal length from two affine correspondences (ACs). Cameras are commonly used in combination with inertial measurement units (IMUs) in applications such as self-driving cars, smartphones, and unmanned aerial vehicles. The vertical direction of camera views can be obtained by IMU measurements. The relative pose between two cameras is reduced from 5DOF to 3DOF. We propose a new solver to estimate the 3DOF relative pose and focal length. First, we establish constraint equations from two affine correspondences when the vertical direction is known. Then, based on the properties of the equation system with nontrivial solutions, four equations can be derived. These four equations only involve two parameters: the focal length and the relative rotation angle. Finally, the polynomial eigenvalue method is utilized to solve the problem of focal length and relative rotation angle. The proposed solver is evaluated using synthetic and real-world datasets. The results show that our solver performs better than the existing state-of-the-art solvers.

Abstract PDF Upgrade to Chat

Summary

The paper introduces a minimal solver for 3-DOF relative pose and focal length estimation using just two affine correspondences and IMU-derived vertical information.
The methodology leverages Cayley parameterization and transforms the problem into a polynomial eigenvalue formulation, enhancing numerical stability and computational efficiency.
Experimental results demonstrate that the proposed approach reduces errors in focal, rotation, and translation estimates by one to two orders of magnitude compared to existing methods on both synthetic and real datasets.

Minimal Solver for Relative Pose with Unknown Focal Length from Two Affine Correspondences under Known Vertical

Problem Setting and Prior Work

Relative pose estimation with unknown focal length is central in monocular SLAM, SfM, and VO, especially in robotics and autonomous driving. Traditional solutions require 6 point correspondences (for semi-calibrated cameras with unknown focal), or 5 for essential matrix estimation in fully calibrated scenarios. However, recent research leverages affine correspondences (ACs), which encode both point matches and local image Jacobians, to reduce the required minimal sample size for robust estimation. Simultaneously, the widespread co-deployment of visual sensors with IMUs in self-driving and mobile platforms enables knowledge of the vertical direction, reducing the minimal problem's degrees of freedom (DOF).

Prior solvers using ACs either operate under more restrictive motion assumptions (planarity, e.g. [21]) or require more samples (such as 4-pt solvers with IMU, e.g. [26]) or do not exploit IMU priors for tighter reduction in DOF. Polynomial system-solving approaches, including Gröbner basis and polynomial eigenvalue methods, remain the core algebraic tool for such minimal solvers. The challenge is to further minimize sample size, maximize numerical stability, and general applicability to arbitrary motion with minimal algebraic and computational complexity.

Methodology

The proposed solver provides a solution to the $3$-DOF relative pose and focal length estimation for the semi-calibrated two-view setting, given only two ACs and knowledge of the vertical direction (from an IMU). The approach tightly integrates AC-provided constraints and the reduction of DOF—achievable due to IMU-supplied roll and pitch, such that the remaining unknown is a single rotation angle about the vertical axis.

Geometric Constraints Derivation

Each AC provides three independent equations: one from the point match and two from the local affine mapping. Given two ACs, the aggregate system is four equations in three unknowns (rotation angle about vertical, focal length, translation up to scale).
Using Cayley parameterization for rotation about vertical, the constraints are converted into determinant equations, producing a system with two variables: the relative rotation parameter $s = \tan(\theta/2)$ and the unknown focal $f$ .
The constraint matrix, after elimination, yields four cubic/quintic determinant equations in $s$ and $f$ , with maximal degrees $s^6f^4$ and $s^6f^5$ .

Polynomial Eigenvalue Solution

The determinant constraints are linear in monomials of $f$ for fixed $s$ , making the relative pose estimation reduce to a polynomial eigenvalue problem.
The system is reformulated as $B'(s)J = 0$ , where $B'(s)$ is polynomial in $s$ (degree 6), and $J$ is the vector of monomials in $f$ (to degree 5).
The eigenvalues of $s$ are found by forming a companion-like matrix and solving via Schur decomposition; corresponding $f$ is obtained from the resulting nullspace. Invalid or degenerate solutions are pruned using structure in $J$ .
The translation direction is recovered from the nullspace of a reduced matrix once $(s, f)$ is fixed.

Experimental Results

Comprehensive synthetic and real-data experiments evidence the method's numerical advantage.

Synthetic Data

Numerical stability (median errors over 1000 runs, noise-free and noisy settings) is consistently superior: the method's errors (focal length, rotation, translation) are 1–2 orders of magnitude lower than baselines (Bara-2AC-5DOF-f [47], Ding-4PC-3DOF-f [26], Kuke-6PC-5DOF-f [15]).
The method is robust to image noise, and affine/IMU noise, and demonstrates resilience to principal point perturbation, outperforming other solvers consistently.

KITTI, Smartphone, and Vehicular Datasets

On KITTI, the method achieves the lowest mean errors in all metrics across 11 sequences, with a marked reduction over other state-of-the-art schemes (see Tables II/III/IV in the paper).
On collected smartphone and vehicular platform datasets, the scheme delivers the lowest average focal length, rotation, and translation errors. The polynomial eigenvalue method is more computationally efficient in RANSAC than Gröbner-based approaches, reaching runtimes as low as 0.05s (vehicle data).

Theoretical and Practical Implications

The introduction of a minimal solver requiring only two ACs under known vertical substantially improves estimator robustness within the RANSAC framework. The reduction in minimal sample size directly translates into lower computational cost and higher probability of inlier sampling, resulting in more reliable model selection in high-outlier-rate settings, crucial for e.g., autonomous robotics, AR, and real-time mapping.

By leveraging IMU priors without sacrificing support for arbitrary relative motion, this method generalizes better than previous planar or more restrictive approaches [21], [26]. Additionally, the purely geometric nature circumvents the generalization and resource issues associated with recent deep learning-based pose approaches.

The polynomial eigenvalue-based solution method offers both improved algebraic numerical stability and implementation practicality; it does not require expert algebraic geometry knowledge and supports solver automatization, unlike Gröbner basis schemes.

Directions for Future Research

Potential routes for further development include extending the work to robust estimation of lens distortion or handling mismatched or partially corrupted ACs (possibly integrating learning-based pseudocorrespondence filtering). Generalization to uncalibrated settings or to variable focal settings (zooming and rolling shutter effects) could also drive future advancements. Tighter IMU–camera extrinsic calibration and incremental multi-view scenarios are likely application areas. Hybrid analytic/deep methods that retain geometric interpretability while leveraging data-driven denoising for AC estimation are another promising direction.

Conclusion

This paper presents a minimal, numerically stable solver for semi-calibrated relative pose and focal length estimation using only two affine correspondences under a known vertical, afforded by IMU. The method establishes novel determinant-based constraints and solves the resulting polynomial system via an efficient eigenvalue approach. Extensive validation under varied conditions demonstrates clear performance gains over existing approaches in both accuracy and efficiency, supporting its adoption for next-generation visual-inertial navigation and mapping systems (2512.22833).

Markdown Report Issue