Papers
Topics
Authors
Recent
Search
2000 character limit reached

Metric Depth from Focused Plenoptic Cameras

Updated 7 February 2026
  • The paper introduces a single-shot metric depth estimation technique using a focused plenoptic camera, leveraging calibrated light-field encoding and stereo triangulation principles.
  • It details both classical model-based and recent deep learning pipelines to extract dense or sparse depth maps from micro-image arrays with sub-decimeter accuracy.
  • Empirical results validate the approach while highlighting calibration challenges and processing constraints, with promising applications in robotics, automotive, and medical imaging.

Single-shot metric depth estimation from focused plenoptic cameras enables the recovery of quantitative scene geometry—measured in metric units—from a single capture. Focused, or "type 2.0," plenoptic cameras employ a combination of a main lens and a micro-lens array (MLA) to record both spatial and angular light-field information, allowing the extraction of dense or sparse depth maps with true scale. This approach addresses longstanding challenges of conventional stereo and monocular setups, notably the achievement of metric depth without the need for multi-camera baselines or unobservable scale. The following sections dissect the foundational camera geometries, triangulation principles, algorithmic pipelines, recent learning-based advances, calibration and scaling strategies, and the state of empirical validation in this research domain.

1. Focused Plenoptic Camera Geometry and Light-Field Encoding

A focused plenoptic camera consists of a main lens (objective) and an MLA meticulously positioned to dissect the main-lens image into an array of spatially localized micro-images. The main lens, typically with focal length ff or fUf_U, projects a continuous image onto the MLA, which itself consists of micro-lenses of focal length fmlf_{ml} or fsf_s and pitch pMp_M. Each micro-lens forms a micro-image on the underlying sensor, whose pitch ppp_p defines the system’s ultimate spatial/angular resolution.

The commonly adopted two-plane parameterization encodes the light field as L(x,y,u,v)L(x, y, u, v), where (x,y)(x, y) index spatial samples (micro-lens center positions) and (u,v)(u, v) index angular samples (typically corresponding to pixel positions within each micro-image). Calibration routines map raw sensor coordinates (xk,y)(x_k, y_\ell) to light-field indices (sj,uc+i)(s_j, u_{c+i}) according to known MLA and sensor geometry. This setup enables each scene point to be imaged from multiple directions, facilitating subsequent multi-view correspondence and stereo-based triangulation (Hahne et al., 2020, Lasheras-Hernandez et al., 2024).

2. Metric Depth Recovery via Disparity and Triangulation

Metric depth in plenoptic systems is fundamentally tied to the geometric relation between micro-lens (virtual viewpoint) positions and sub-aperture disparities. The standard triangulation formula for stereo depth estimation, Z=bBΔxZ = \frac{bB}{\Delta x}, is reinterpreted by treating pairs of micro-images as views from a virtual camera array with baseline BGB_G. For focused plenoptic cameras:

  • The effective baseline BGB_G is determined by the MLA configuration and main-lens entrance pupil mapping.
  • The metric depth at disparity Δx\Delta x between two virtual views separated by GG micro-lens steps is given by

ZG,Δx=bNBGΔxpN+bNtanΦGZ_{G, \Delta x} = \frac{b_N B_G}{\Delta x\,p_N + b_N \tan\Phi_G}

where bNb_N is the virtual image distance, pNp_N is the pixel scale, and ΦG\Phi_G is a tilt correction (for non-parallel virtual axes). For parallel virtual axes, this simplifies to Z=bNBGΔxpNZ = \frac{b_N B_G}{\Delta x\,p_N}.

  • Disparity extraction is performed either by direct block-matching or more advanced cost aggregation algorithms such as semi-global matching (SGM), operating between sub-aperture images synthesized from the plenoptic 4D light field (Hahne et al., 2020, Lasheras-Hernandez et al., 2024).
  • In recent deep learning-based pipelines, geometric triangulation may be used implicitly to supervise networks trained on metric labels, allowing the sparse or dense recovery of absolute scene scale (Lasheras-Hernandez et al., 2024).

3. Single-Shot Depth Estimation Pipelines

Classical Model-Based Pipeline

A canonical pipeline for metric depth map recovery from a single standard plenoptic camera shot includes the following steps:

  1. Radiometric correction and lens distortion removal on raw input.
  2. Micro-image grid recentering for sub-pixel accuracy.
  3. Extraction of sub-aperture images Ei,gE_{i,g} from the micro-image array.
  4. Selection of stereo pairs (with chosen micro-lens gap GG), followed by stereo correspondence/disparity computation.
  5. Per-pixel conversion of disparity to metric depth as per the aforementioned triangulation formulae.
  6. Remapping of the depth map to a desired focal/reference plane.
  7. Optional post-processing (fusion of multiple disparities, denoising, regularization) for depth map refinement (Hahne et al., 2020).

Learning-Based and Hybrid Approaches

Recent pipelines exploit both classic geometry and deep learning:

  • Generation of a sparse metric point cloud at micro-lens centers using a neural encoder-decoder ("Microlens Depth Network") trained with metric ground truth from a stereo rig.
  • Densification by predicting a dense relative depth map with a foundation model such as Depth Anything, followed by affine alignment of the dense map to the sparse metric support using a robust linear regression (Theil–Sen estimator).
  • This yields a single-shot, dense metric depth map well-aligned to the plenoptic imagery (Lasheras-Hernandez et al., 2024).

4. Integration of Multi-Cue Depth: Defocus and Correspondence

Multi-focus plenoptic cameras utilize MLAs with varying focal lengths, enabling the extraction of both correspondence (disparity) and defocus (blur) cues. This dual-cue paradigm enhances depth estimation stability and disambiguation, especially in textureless or ambiguous regions:

  • Defocus-aware imaging models extract a feature vector [u,v,ρ,1]T[u,v,\rho,1]^T for each micro-image, where ρ\rho is the blur-circle radius, predicted from the observed point-spread-function width.
  • The model relates ρ\rho to virtual depth via ρ=mν1+qi\rho = m\,\nu^{-1} + q_i, with m,qim, q_i from calibration. Inverse-projection recovers metric 3D positions (up to scale).
  • Blur-equalized matching employs Gaussian convolution to normalize micro-image PSF width before matching, and depth estimation is cast as an energy minimization over both cues.
  • Scale calibration corrects inherent bias in the model, typically via checkerboard capture at varying distances and optimization of a scaling polynomial γ(z)\gamma(z) to map all depths onto an absolute metric scale (Labussière et al., 2023).

5. Calibration and Metric Scaling Methodologies

Accurate metric depth estimation mandates comprehensive calibration of the plenoptic imaging system:

  • MLA geometry: Focal lengths and pitch of micro-lenses, verified by manufacturer specification or optical fringe calibration.
  • Sensor geometry: Pixel pitch and grid orientation, typically provided by the sensor vendor.
  • Main-lens parameters: Focal length, principal-plane separations, and pupil positions. Calibration is typically achieved via MTF-focus scans, triangulation routines, or checkerboard calibrations (Zhang method and hand-eye registration).
  • Empirical polynomial scaling (γ(z)\gamma(z)) determined by fitting observed scale errors on ground-truth-known targets.
  • Extrinsic calibration aligns the plenoptic system to any auxiliary sensors such as stereo pairs or lidar (Hahne et al., 2020, Labussière et al., 2023, Lasheras-Hernandez et al., 2024).

6. Empirical Validation and Real-World Performance

Three principal studies provide quantitative performance metrics:

Pipeline / Study Dataset & Hardware Metric Depth Accuracy Notable Details
Hahne et al. (Hahne et al., 2020) Custom SPC, both simulated (Zemax) & real scene targets Metric depth error << ±0.33% (sim); << 0.5% (real rods) Baseline/tilt deviation << 0.02%; single-capture, proximal depth (<5 m)
Labussière et al. (Labussière et al., 2023) Multi-focus Raytrix, lidar ground truth, real/sim scenes Mean median error 19\lesssim 19 mm; scale error << 0.05% BLADE algorithm, explicit blur cue, robust scale calibration
Jureka et al. (Lasheras-Hernandez et al., 2024) LFS dataset: Raytrix + stereo rig, indoor static scenes RMSE = 8.50 cm (dense-final metric), 5.55 cm (sparse) Deep learning with robust scale alignment, public dataset

Across platforms, single-shot pipelines consistently achieve sub-percent or sub-decimeter metric accuracy in controlled settings once adequately calibrated. Error concentrations are commonly observed near occlusion boundaries, and computation times are currently on the order of 10–30 s per frame (CPU) or several minutes per frame, depending on depth regularization and network inference pathways.

7. Limitations and Prospects

Despite robust performance, several operational and research challenges remain:

  • The achievable metric precision and depth range are limited by the intrinsically small baseline (\sim mm) between micro-lens virtual views.
  • Densification of sparse metric maps relies on strong foundation models, with scale alignment constrained by sparse sampling and robust regression.
  • Heavy upfront calibration (geometric and extrinsic) imposes practical barriers to rapid deployment or hardware swapping.
  • Real-time operation is impeded by current processing pipelines, but GPU acceleration and global regularization (e.g., belief propagation, MRFs) are identified as future avenues.
  • Integrating explicit occlusion models and outlier rejection, as well as end-to-end trainable systems, are active frontiers.
  • The approach is particularly favorable for close-range (0.5–5 m) robotic, automotive, and medical imaging scenarios, where low-occlusion, moving-scene resilience, and compact single-shot capture are highly valued (Hahne et al., 2020, Labussière et al., 2023, Lasheras-Hernandez et al., 2024).

Continued development of public datasets, open-source calibration toolkits, and unified evaluation benchmarks will further solidify the role of single-shot metric depth from focused plenoptic cameras in 3D perception systems.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Single-Shot Metric Depth from Focused Plenoptic Cameras.