FisheyeGaussianLift: 3D Reconstruction from Fisheye
- FisheyeGaussianLift is a framework that converts distorted fisheye imagery into explicit 3D or BEV Gaussian representations for accurate scene reconstruction and segmentation.
- It adapts geometric lifting and differentiable rendering to correct wide-angle distortions, using anisotropic 3D Gaussian primitives and calibrated projection models.
- Real-time GPU implementations demonstrate enhanced performance in autonomous driving, robotics, and 360° scene analysis by improving fidelity and semantic understanding.
FisheyeGaussianLift refers to a class of algorithms and frameworks for lifting distorted fisheye imagery into explicit 3D or BEV (bird’s-eye view) Gaussian representations, enabling high-fidelity reconstruction, novel view synthesis, and semantic understanding directly from wide-FOV, distortion-heavy inputs. These methods address the incompatibilities between traditional 3D Gaussian Splatting (3DGS), which assumes a pinhole camera model, and the unique projections of fisheye or omnidirectional sensors. FisheyeGaussianLift encompasses approaches that adapt both the geometric lifting (“unprojection”) and the differentiable rendering pipeline to wide-angle and non-pinhole lenses, supporting advances in autonomous driving, robotics, 360° scene rendering, and scene segmentation.
1. Mathematical Foundations and Gaussian Parameterization
FisheyeGaussianLift algorithms jointly model scene geometry and uncertainty using explicit anisotropic 3D Gaussian primitives. Each primitive is defined by a center , a symmetric positive-definite covariance , view-dependent color SH coefficients (or simply RGB ), and an opacity parameter or . The density is given by
Color is typically modeled via spherical harmonics or per-ray direction mappings. For semantic lifting (e.g., BEV), per-pixel feature vectors and associated uncertainty-aware are also predicted (Sonarghare et al., 21 Nov 2025).
This parameterization is propagated through the entire rendering or segmentation pipeline in a fully differentiable fashion, allowing optimization via gradient descent. Covariance updates are managed by eigendecomposition or affine pushforward under the projection's Jacobian (Ren et al., 2024, Liao et al., 2024).
2. Fisheye Camera Models and Geometric Lifting
Standard projection schemes (pinhole, perspective) become inaccurate with wide-FOV or fisheye lenses, necessitating explicit modeling of lens distortions. FisheyeGaussianLift relies on calibrated or learnable models such as:
- Equidistant Model: for incidence angle ; linear mapping of angle to radial distance (Liao et al., 2024, Gunes et al., 9 Aug 2025).
- Polynomial/Radial Distortion (e.g., Kannala-Brandt): , supporting parameterized nonlinearities (Ren et al., 2024, Sonarghare et al., 21 Nov 2025).
- Dual-fisheye and ERP (Equirectangular Projection): Mapping between panorama and spherical coordinates with per-pixel sampled rays, and learnable angular distortions for lens stitching (Shin et al., 27 Aug 2025).
Geometric lifting proceeds by inverting the projection per-pixel via LUTs or analytically, assigning each pixel a calibrated 3D ray and predicted depth, then computing the corresponding 3D Gaussian mean/covariance (Sonarghare et al., 21 Nov 2025):
This fidelity in geometric unprojection is central to avoiding undistortion artifacts and utilizing the full field of view.
3. Differentiable Rendering and Splatting
Differentiable rendering under FisheyeGaussianLift involves the projection of 3D Gaussians into 2D (or BEV) under the distortion-aware model, rasterizing their effect as anisotropic elliptical "splats". The core steps are:
- Affine Warping: Apply rotation and stretching to Gaussian center and covariance to account for ray bending and field-of-view distortions (polar/tangential stretching) (Ren et al., 2024).
- Projection and Splatting: Calculate 2D location and image-plane covariance via the model Jacobian. Splat using elliptical weighted averages or kernel integration (Liao et al., 2024, Huang et al., 29 May 2025).
- Alpha-Blending/Compositing: Front-to-back (transmittance-aware) compositing for color accumulation (Ren et al., 2024), or kernel-weighted summation for features in BEV contexts (Sonarghare et al., 21 Nov 2025).
- Gradient Backpropagation: All transformation and rendering steps expose gradients (through quaternion operations, stretching, and Jacobian evaluation), enabling end-to-end training or optimization.
- Fast Association (PBF/BEAP): Efficient mapping of Gaussians to rays in large-FOV settings by bounding frusta in angular domains rather than image-plane AABBs (Particle Bounding Frustum, Bipolar Equiangular Projection) (Huang et al., 29 May 2025).
4. Multi-View and Multi-Modal Scene Unification
FisheyeGaussianLift enables unification of multiple sensor types by sharing a common set of 3D Gaussians across all camera models (pinhole and fisheye), adapting their projection at render time via analytic or learned warps (Ren et al., 2024, Shin et al., 27 Aug 2025). Supervision can be constructed from:
- Photometric loss on rendered pixel colors
- Similarity terms (e.g., SSIM) for image-level quality
- Cross-entropy/IoU-based segmentation loss in BEV applications
- Depth and geometric regularization, enforcing consistency between modality-specific views (LiDAR, depth, normal, semantics)
- Calibration variable regularization when jointly optimizing lens translation and stitching corrections (Shin et al., 27 Aug 2025)
Consequently, the learned Gaussian cloud encodes a multi-view, multi-modal scene, supporting both reconstruction and semantic segmentation.
5. Practical Implementations and Computational Performance
Frameworks such as UniGaussian (Ren et al., 2024), Fisheye-GS (Liao et al., 2024), FisheyeGaussianLift-BEV (Sonarghare et al., 21 Nov 2025), and Seam360GS (Shin et al., 27 Aug 2025) implement FisheyeGaussianLift on top of established 3DGS pipelines (often in PyTorch, CUDA). Key implementation points:
- Lightweight kernel replacement: Only the projection and its derivatives need adaptation for fisheye, leaving the tile-binning, sorting, and blending intact (Fisheye-GS: +60 lines of CUDA) (Liao et al., 2024).
- Efficient handling of peripheral distortion: Affine approximation suffices for small Gaussians, but methods like 3DGEER utilize closed-form integrals for exactness and real-time rates even on extreme FOVs (Huang et al., 29 May 2025).
- Real-time performance: FlashGS + Fisheye-GS yields ~405 FPS on an A100 GPU at 1752×1168 (Liao et al., 2024); 3DGEER achieves ~251 FPS at state-of-the-art quality (Huang et al., 29 May 2025).
- Memory/compute: Asymptotic complexity matches pinhole 3DGS, with marginal impact (~10–15% intersects increase) in practice.
6. Quantitative Results and Empirical Validation
FisheyeGaussianLift approaches consistently outperform undistort-then-3DGS and pinhole-based baselines, especially in peripheral fidelity and semantic robustness. Selected results:
| Dataset / Metric | FisheyeGaussianLift Variant | PSNR (dB) | SSIM | LPIPS |
|---|---|---|---|---|
| KITTI-360 fisheye (UniGaussian) | Warped affine + stretch (Ren et al., 2024) | 26.19 | 0.897 | 0.185 |
| KITTI-360 pinhole+fisheye (HUGS++) | (Ren et al., 2024) | 25.2–26.2 | — | — |
| ScanNet++ full FoV (3DGEER) | (Huang et al., 29 May 2025) | 31.50 | 0.953 | 0.126 |
| ScanNet++ peripheral (3DGEER) | (Huang et al., 29 May 2025) | 28.94 | 0.945 | — |
| Parking BEV IoU (FisheyeGaussianLift) | Drivable (Sonarghare et al., 21 Nov 2025) | 87.75% | — | — |
| Parking BEV IoU (FisheyeGaussianLift) | Vehicle (Sonarghare et al., 21 Nov 2025) | 57.26% | — | — |
PSNR and SSIM improvements are robust to FOV and initialization method (SfM- vs. depth-based); “sweet spot” FOV for Fisheye-GS is ~160° (Gunes et al., 9 Aug 2025). Methods that properly model distortion avoid artifacts such as halo-rings and peripheral clipping.
7. Extensions, Limitations, and Outlook
FisheyeGaussianLift is extensible to any central projection model by supplying analytic projection and derivatives; frameworks are camera-model-agnostic aside from the initial “lift” mapping (Liao et al., 2024). Dual-fisheye/ERP approaches extend to 360° content, with learnable calibration for lens gap and stitching (Shin et al., 27 Aug 2025). BEV segmentation instantiations explicitly propagate depth and lift-induced uncertainty through anisotropic Gaussian covariances (Sonarghare et al., 21 Nov 2025).
Challenges remain in initialization for scenes with strong distortion/sfM failure; depth-based approaches (e.g., UniK3D) offer high-density alternatives. Performance remains real-time under GPU implementations for both 3D reconstruction and segmentation settings.
FisheyeGaussianLift tightly integrates probabilistic geometric modeling, differentiable rendering, and explicit handling of distortion, enabling state-of-the-art novel view synthesis, scene reconstruction, and semantic segmentation across wide-angle, multi-view, and omnidirectional camera systems (Sonarghare et al., 21 Nov 2025, Ren et al., 2024, Liao et al., 2024, Huang et al., 29 May 2025, Shin et al., 27 Aug 2025, Gunes et al., 9 Aug 2025).