Metric Depth from Focused Plenoptic Cameras

Updated 7 February 2026

The paper introduces a single-shot metric depth estimation technique using a focused plenoptic camera, leveraging calibrated light-field encoding and stereo triangulation principles.
It details both classical model-based and recent deep learning pipelines to extract dense or sparse depth maps from micro-image arrays with sub-decimeter accuracy.
Empirical results validate the approach while highlighting calibration challenges and processing constraints, with promising applications in robotics, automotive, and medical imaging.

Single-shot metric depth estimation from focused plenoptic cameras enables the recovery of quantitative scene geometry—measured in metric units—from a single capture. Focused, or "type 2.0," plenoptic cameras employ a combination of a main lens and a micro-lens array (MLA) to record both spatial and angular light-field information, allowing the extraction of dense or sparse depth maps with true scale. This approach addresses longstanding challenges of conventional stereo and monocular setups, notably the achievement of metric depth without the need for multi-camera baselines or unobservable scale. The following sections dissect the foundational camera geometries, triangulation principles, algorithmic pipelines, recent learning-based advances, calibration and scaling strategies, and the state of empirical validation in this research domain.

1. Focused Plenoptic Camera Geometry and Light-Field Encoding

A focused plenoptic camera consists of a main lens (objective) and an MLA meticulously positioned to dissect the main-lens image into an array of spatially localized micro-images. The main lens, typically with focal length $f$ or $f_U$ , projects a continuous image onto the MLA, which itself consists of micro-lenses of focal length $f_{ml}$ or $f_s$ and pitch $p_M$ . Each micro-lens forms a micro-image on the underlying sensor, whose pitch $p_p$ defines the system’s ultimate spatial/angular resolution.

The commonly adopted two-plane parameterization encodes the light field as $L(x, y, u, v)$ , where $(x, y)$ index spatial samples (micro-lens center positions) and $(u, v)$ index angular samples (typically corresponding to pixel positions within each micro-image). Calibration routines map raw sensor coordinates $(x_k, y_\ell)$ to light-field indices $f_U$ 0 according to known MLA and sensor geometry. This setup enables each scene point to be imaged from multiple directions, facilitating subsequent multi-view correspondence and stereo-based triangulation (Hahne et al., 2020, Lasheras-Hernandez et al., 2024).

2. Metric Depth Recovery via Disparity and Triangulation

Metric depth in plenoptic systems is fundamentally tied to the geometric relation between micro-lens (virtual viewpoint) positions and sub-aperture disparities. The standard triangulation formula for stereo depth estimation, $f_U$ 1, is reinterpreted by treating pairs of micro-images as views from a virtual camera array with baseline $f_U$ 2. For focused plenoptic cameras:

The effective baseline $f_U$ 3 is determined by the MLA configuration and main-lens entrance pupil mapping.
The metric depth at disparity $f_U$ 4 between two virtual views separated by $f_U$ 5 micro-lens steps is given by

$f_U$ 6

where $f_U$ 7 is the virtual image distance, $f_U$ 8 is the pixel scale, and $f_U$ 9 is a tilt correction (for non-parallel virtual axes). For parallel virtual axes, this simplifies to $f_{ml}$ 0.

Disparity extraction is performed either by direct block-matching or more advanced cost aggregation algorithms such as semi-global matching (SGM), operating between sub-aperture images synthesized from the plenoptic 4D light field (Hahne et al., 2020, Lasheras-Hernandez et al., 2024).
In recent deep learning-based pipelines, geometric triangulation may be used implicitly to supervise networks trained on metric labels, allowing the sparse or dense recovery of absolute scene scale (Lasheras-Hernandez et al., 2024).

3. Single-Shot Depth Estimation Pipelines

Classical Model-Based Pipeline

A canonical pipeline for metric depth map recovery from a single standard plenoptic camera shot includes the following steps:

Radiometric correction and lens distortion removal on raw input.
Micro-image grid recentering for sub-pixel accuracy.
Extraction of sub-aperture images $f_{ml}$ 1 from the micro-image array.
Selection of stereo pairs (with chosen micro-lens gap $f_{ml}$ 2), followed by stereo correspondence/disparity computation.
Per-pixel conversion of disparity to metric depth as per the aforementioned triangulation formulae.
Remapping of the depth map to a desired focal/reference plane.
Optional post-processing (fusion of multiple disparities, denoising, regularization) for depth map refinement (Hahne et al., 2020).

Learning-Based and Hybrid Approaches

Recent pipelines exploit both classic geometry and deep learning:

Generation of a sparse metric point cloud at micro-lens centers using a neural encoder-decoder ("Microlens Depth Network") trained with metric ground truth from a stereo rig.
Densification by predicting a dense relative depth map with a foundation model such as Depth Anything, followed by affine alignment of the dense map to the sparse metric support using a robust linear regression (Theil–Sen estimator).
This yields a single-shot, dense metric depth map well-aligned to the plenoptic imagery (Lasheras-Hernandez et al., 2024).

4. Integration of Multi-Cue Depth: Defocus and Correspondence

Multi-focus plenoptic cameras utilize MLAs with varying focal lengths, enabling the extraction of both correspondence (disparity) and defocus (blur) cues. This dual-cue paradigm enhances depth estimation stability and disambiguation, especially in textureless or ambiguous regions:

Defocus-aware imaging models extract a feature vector $f_{ml}$ 3 for each micro-image, where $f_{ml}$ 4 is the blur-circle radius, predicted from the observed point-spread-function width.
The model relates $f_{ml}$ 5 to virtual depth via $f_{ml}$ 6, with $f_{ml}$ 7 from calibration. Inverse-projection recovers metric 3D positions (up to scale).
Blur-equalized matching employs Gaussian convolution to normalize micro-image PSF width before matching, and depth estimation is cast as an energy minimization over both cues.
Scale calibration corrects inherent bias in the model, typically via checkerboard capture at varying distances and optimization of a scaling polynomial $f_{ml}$ 8 to map all depths onto an absolute metric scale (Labussière et al., 2023).

5. Calibration and Metric Scaling Methodologies

Accurate metric depth estimation mandates comprehensive calibration of the plenoptic imaging system:

MLA geometry: Focal lengths and pitch of micro-lenses, verified by manufacturer specification or optical fringe calibration.
Sensor geometry: Pixel pitch and grid orientation, typically provided by the sensor vendor.
Main-lens parameters: Focal length, principal-plane separations, and pupil positions. Calibration is typically achieved via MTF-focus scans, triangulation routines, or checkerboard calibrations (Zhang method and hand-eye registration).
Empirical polynomial scaling ( $f_{ml}$ 9) determined by fitting observed scale errors on ground-truth-known targets.
Extrinsic calibration aligns the plenoptic system to any auxiliary sensors such as stereo pairs or lidar (Hahne et al., 2020, Labussière et al., 2023, Lasheras-Hernandez et al., 2024).

6. Empirical Validation and Real-World Performance

Three principal studies provide quantitative performance metrics:

Pipeline / Study	Dataset & Hardware	Metric Depth Accuracy	Notable Details
Hahne et al. (Hahne et al., 2020)	Custom SPC, both simulated (Zemax) & real scene targets	Metric depth error $f_s$ 0 ±0.33% (sim); $f_s$ 1 0.5% (real rods)	Baseline/tilt deviation $f_s$ 2 0.02%; single-capture, proximal depth (<5 m)
Labussière et al. (Labussière et al., 2023)	Multi-focus Raytrix, lidar ground truth, real/sim scenes	Mean median error $f_s$ 3 mm; scale error $f_s$ 4 0.05%	BLADE algorithm, explicit blur cue, robust scale calibration
Jureka et al. (Lasheras-Hernandez et al., 2024)	LFS dataset: Raytrix + stereo rig, indoor static scenes	RMSE = 8.50 cm (dense-final metric), 5.55 cm (sparse)	Deep learning with robust scale alignment, public dataset

Across platforms, single-shot pipelines consistently achieve sub-percent or sub-decimeter metric accuracy in controlled settings once adequately calibrated. Error concentrations are commonly observed near occlusion boundaries, and computation times are currently on the order of 10–30 s per frame (CPU) or several minutes per frame, depending on depth regularization and network inference pathways.

7. Limitations and Prospects

Despite robust performance, several operational and research challenges remain:

The achievable metric precision and depth range are limited by the intrinsically small baseline ( $f_s$ 5 mm) between micro-lens virtual views.
Densification of sparse metric maps relies on strong foundation models, with scale alignment constrained by sparse sampling and robust regression.
Heavy upfront calibration (geometric and extrinsic) imposes practical barriers to rapid deployment or hardware swapping.
Real-time operation is impeded by current processing pipelines, but GPU acceleration and global regularization (e.g., belief propagation, MRFs) are identified as future avenues.
Integrating explicit occlusion models and outlier rejection, as well as end-to-end trainable systems, are active frontiers.
The approach is particularly favorable for close-range (0.5–5 m) robotic, automotive, and medical imaging scenarios, where low-occlusion, moving-scene resilience, and compact single-shot capture are highly valued (Hahne et al., 2020, Labussière et al., 2023, Lasheras-Hernandez et al., 2024).

Continued development of public datasets, open-source calibration toolkits, and unified evaluation benchmarks will further solidify the role of single-shot metric depth from focused plenoptic cameras in 3D perception systems.

Markdown Report Issue Upgrade to Chat

References (3)

Baseline and Triangulation Geometry in a Standard Plenoptic Camera (2020)

Single-Shot Metric Depth from Focused Plenoptic Cameras (2024)

Blur aware metric depth estimation with multi-focus plenoptic cameras (2023)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Single-Shot Metric Depth from Focused Plenoptic Cameras.

Metric Depth from Focused Plenoptic Cameras

1. Focused Plenoptic Camera Geometry and Light-Field Encoding

2. Metric Depth Recovery via Disparity and Triangulation

3. Single-Shot Depth Estimation Pipelines

Classical Model-Based Pipeline

Learning-Based and Hybrid Approaches

4. Integration of Multi-Cue Depth: Defocus and Correspondence

5. Calibration and Metric Scaling Methodologies

6. Empirical Validation and Real-World Performance

7. Limitations and Prospects

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Metric Depth from Focused Plenoptic Cameras

1. Focused Plenoptic Camera Geometry and Light-Field Encoding

2. Metric Depth Recovery via Disparity and Triangulation

3. Single-Shot Depth Estimation Pipelines

Classical Model-Based Pipeline

Learning-Based and Hybrid Approaches

4. Integration of Multi-Cue Depth: Defocus and Correspondence

5. Calibration and Metric Scaling Methodologies

6. Empirical Validation and Real-World Performance

7. Limitations and Prospects

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research