Multi-Exposure Image Fusion

Updated 26 January 2026

Multi-exposure image fusion is a technique that combines a series of low dynamic range images captured with different exposures to produce one balanced and high-quality output.
It leverages methods such as transform-domain processing, variational optimization, and deep learning to preserve detail, manage contrast, and reduce artifacts.
Advanced approaches like block-DCT averaging, Retinex-based decomposition, and efficient lookup table methods achieve high image quality, evidenced by excellent PSNR and SSIM scores.

Multi-Exposure Image Fusion Algorithm

Multi-exposure image fusion (MEF) is a computational imaging strategy that synthesizes a high-quality single image from a stack of differently exposed low dynamic range (LDR) images of the same scene. Its primary objective is to overcome the constraints of sensor dynamic range, simultaneously preserving highlight and shadow detail, enhancing local contrast, mitigating artifacts, and producing an output amenable to standard displays without explicit tone-mapping. State-of-the-art MEF encompasses a spectrum of methodologies, including transform-domain approaches, deep neural networks, unsupervised learning, and sophisticated perceptual weighting mechanisms, each addressing a distinct aspect of photometric consistency, robustness, and computational scalability.

1. Problem Formulation and General Principles

Let the MEF input be a stack of $N_e$ spatially aligned LDR images $\{I_i(\mathbf{p})\}_{i=1}^{N_e}$ , each captured with a unique exposure setting or at different exposure values (EVs). The fusion algorithm seeks to estimate an output image $F(\mathbf{p})$ that, at every pixel $\mathbf{p}$ , aggregates the salient structural, chromatic, and photometric content of the input stack. The fusion is typically guided by three local features: well-exposedness, contrast, and saturation, as originally formalized in the exposure fusion technique of Mertens et al.:

$w_i(\mathbf{p}) = C_i(\mathbf{p})^\alpha \cdot S_i(\mathbf{p})^\beta \cdot E_i(\mathbf{p})^\gamma,$

where $C$ is local contrast, $S$ is chromatic saturation, and $E$ is well-exposedness (often via a Gaussian centered around mid-gray). These pixel-wise weights guide either a direct combination or a multi-resolution (e.g., Laplacian pyramid) blending of the input stack.

Subsequent developments introduce adaptive or learned weighting, transform-domain fusion (such as DCT or Fourier), variational formulations, Retinex-based decomposition, lookup-table schemes, and end-to-end deep networks that use spatial, frequency, or multi-modal features for decision-making (Ramakarishnan et al., 2021, Chen et al., 2024, Visavakitcharoen et al., 2019, Yang et al., 2023, Qu et al., 2021, Su et al., 2024, Zhang, 2020).

2. Transform-Domain and Analytical Fusion Methods

2.1 Block-DCT Averaging

Ramakrishnan & Pete propose DCT-domain blockwise fusion, partitioning each color channel into $N \times N$ blocks and computing the 2D DCT coefficients for each block across exposures: $C_i^k(u,v) = \mathrm{DCT}\bigl\{ f_i^k(x,y) \bigr\},$ with $\{I_i(\mathbf{p})\}_{i=1}^{N_e}$ 0 denoting block $\{I_i(\mathbf{p})\}_{i=1}^{N_e}$ 1 of exposure $\{I_i(\mathbf{p})\}_{i=1}^{N_e}$ 2 (Ramakarishnan et al., 2021). Fusion is performed by coefficient-wise averaging: $\{I_i(\mathbf{p})\}_{i=1}^{N_e}$ 3 Inverse DCT on $\{I_i(\mathbf{p})\}_{i=1}^{N_e}$ 4 yields the fused block. The DC (average) coefficient encodes well-exposedness, energy in AC coefficients quantifies contrast, and the statistical variability of Cr/Cb DCT energies relates to color saturation. This approach is non-parametric, non-pyramidal, and yields performance competitive with Laplacian pyramid fusion and state-of-the-art filtering/tonemap-based HDR methods in PSNR (70 dB), SSIM ( $\{I_i(\mathbf{p})\}_{i=1}^{N_e}$ 50.989), low IMMSE, and VDP-Q (98.2%).

2.2 Variational and Information-Theoretic Methods

Variational MEF methods express the fusion as an optimization problem with a data-fidelity term weighted by spatially-varying importance maps $\{I_i(\mathbf{p})\}_{i=1}^{N_e}$ 6, and a multiscale regularization (Laplacian pyramid) term: $\{I_i(\mathbf{p})\}_{i=1}^{N_e}$ 7 where $\{I_i(\mathbf{p})\}_{i=1}^{N_e}$ 8 are pyramid bands (Singh et al., 2022). $\{I_i(\mathbf{p})\}_{i=1}^{N_e}$ 9 can be computed as the normalized local entropy of pixel neighborhoods; regions of maximal local structural variability (texture) at each pixel receive higher weights. Contrast-Limited Adaptive Histogram Equalization (CLAHE) improves the uniformity and exposes midtone details before this weighting step.

2.3 Principal Component and Saliency Weighting

PAS-MEF combines PCA-derived weights, adaptive well-exposedness, and saliency maps. The first principal component of the grayscale inputs identifies dominant intensity structure, adaptive Gaussians center the well-exposedness weighting, and DCT-based saliency predicts perceptual importance (Karakaya et al., 2021). Weights are refined with a guided filter, then used to blend Laplacian pyramid imagery, achieving strong MEF-SSIM scores ( $F(\mathbf{p})$ 00.986) with minimal computational overhead.

3. Deep Learning and Hybrid Paradigms

3.1 Transformer- and CNN-Based Fusion Networks

Recent deep learning architectures for MEF include:

TransMEF: Combines convolutional layers (local extraction) and transformer layers (global dependencies) in an encoder-decoder design, trained via self-supervised multi-task learning. Three reconstruction tasks—gamma perturbation, Fourier corruption, region shuffling—equip the network with exposure-robust priors. At inference, feature maps from each exposure are averaged and decoded. TransMEF achieves top performance on multiple no-reference metrics (e.g., Q^MI, Q^TE, PSNR, SSIM) across MEF benchmarks (Qu et al., 2021).
MEF-SFI: Jointly leverages spatial convolutions (for local structure) and deep Fourier transform (for global illumination). Dual-path fusion modules extract and recombine amplitude and phase spectra, enabling global exposure balancing while preserving texture. Losses are computed in both image and frequency domains, with quantitative superiority on mutual information, SSIM, FMI metrics (Yang et al., 2023).
MobileMEF: Highly efficient encoder-decoder with depthwise separable convolutions, lightweight building blocks, and a single-scale fusion bypass for fast (sub-2s for 4K) mobile MEF. It outperforms existing deep models for PSNR and SSIM on the SICE benchmark and provides demonstrably runtime and memory advantages on embedded hardware (Kirsten et al., 2024).
Retinex-MEF: Utilizes unsupervised learning to decompose each input into an illumination map and a shared reflectance, modeling overexposure glare explicitly via a learned additive component. Training loss suite ensures spatial smoothness, reflectance consistency, and exposure suppressing constraints. Exposure is tunable post-hoc via a closed-form monotonic function. Retinex-MEF outperforms prior unsupervised and deep models in metrics sensitive to dynamic range and glare (Bai et al., 10 Mar 2025).

3.2 Lookup Table and Implicit Representation Methods

MEFLUT: Encodes fusion weights as 1D learned lookup tables (LUTs), one per exposure, mapping input pixel value to fusion weight. Weight estimation CNNs leverage frame, channel, and spatial attention. The LUTs are filled by feeding constant-value images into the CNN and extracting mean response, enabling runtime MEF that is both parameter-light and extremely fast (∼0.5–4 ms/4K frame), with SOTA SSIM/MEF-SSIM on SICE and custom datasets (Jiang et al., 2023).
Distilled 3D LUT Grid: Utilizes a lightweight teacher-student setup, where the student network is trained to regression a high-resolution, editable 3D LUT grid as an implicit neural function of scene content. This model supports arbitrary grid resolution at inference and achieves real-time UHD MEF (e.g., 102 ms/4K frame on Pixel 4 XL) with competitive PSNR/SSIM on SICE, NTIRE2022, and MED datasets (Su et al., 2024).

4. Practical Pipeline Variants and Implementation Notes

4.1 Registration and Robustness to Dynamics

For scenes captured with handheld devices or featuring dynamic objects, alignment is crucial:

Hybrid Flow+PatchMatch: An initial dense optical flow aligns exposures, followed by superpixel-based error detection, with PatchMatch refinement in regions with poor alignment caused by occlusion or parallax. This achieves ghost-free, detail-preserving fusion with runtime ( $F(\mathbf{p})$ 112 s/5 x 1.5MP stack) competitive with fast patch-based and deep models (Li et al., 2023).

4.2 Denoising and Noise-Aware MEF

DCT-based joint denoising and fusion exploits collaborative filtering:

Overlapped block-based DCT, collaborative thresholding in patch bundles matched spatially and across exposures, and DCT-domain fusion (contrast-favored for AC, exposure-favored for DC). This removes noise and fuses in situ, outperforming sequential denoise-then-fuse strategies (Buades et al., 2021).

4.3 Exposure and Color Compensation

Automatic Exposure Compensation: Adjusts input LDR luminance by statistics (geometric mean mapping to standardized gray), ensuring all scenes are post-hoc optimally exposed before fusion. Simple average fusion of adjusted images surpasses traditional weighted schemes in entropy and statistical naturalness (Kinoshita et al., 2018).
Inverse Camera Response Correction: Recovers a radiometrically linear HDR map via inverse CRF, then corrects color drift in fused LDR outputs by enforcing hue consistency with the HDR estimate using constant-hue plane decomposition (Visavakitcharoen et al., 2019).

5. Benchmarks, Evaluation, and Quantitative Analysis

The MEFB benchmark (Zhang, 2020) underpins empirical evaluation of MEF methods, offering:

100 static pairs covering diverse lighting/scene conditions.
20 metrics, including structural similarity (SSIM, MEF-SSIM), mutual information, entropy, VIF, and perceptual indices.
Systematic findings: edge-preserving approaches (MGFF) yield highest subjective quality; deep models (IFCNN) dominate information-theoretic metrics; pyramid-based methods (FMMEF) excel in structural consistency; and fast discriminative approaches offer favorable speed-quality trade-offs.

Recent algorithms report PSNR values up to 70 dB (block-DCT), SSIM above 0.98 (block-DCT, TransMEF, MobileMEF), and reliable perceptual indices (VDP-Q $F(\mathbf{p})$ 298%, MEF-SSIM $F(\mathbf{p})$ 30.97–0.99, VIF >1.4). New UHD-oriented models such as IPL and implicit 3D LUTs achieve real-time fusion at 4K with only moderate computational expense (Chen et al., 2024, Su et al., 2024).

6. Challenges, Limitations, and Future Directions

Persistent challenges in MEF research include:

Dynamic Scene Fusion: Many methods presuppose static/scenes or pre-aligned stacks; robust alignment and ghost handling remain nontrivial, though hybrid flow/patch approaches and deep optical flow integration show promise (Li et al., 2023, Chen et al., 2024).
Color Fidelity: Fusing chromatic information without artifact or desaturation is challenging. Hybrid methods (Retinex, LUT-based, HSL-guided enhancement) partially mitigate hue shifts and saturation loss (Bai et al., 10 Mar 2025, Mu et al., 2024).
Computational Efficiency: Real-time UHD fusion necessitates algorithmic lightweighting, efficient transform or LUT-based implementations, or hardware-specific optimization (e.g., depthwise separable convolutions, quantization) without loss of quality (Kirsten et al., 2024, Su et al., 2024).
Evaluation Diversity: Modern MEF often departs from objective HDR ground-truth, requiring reliance on perceptual or no-reference metrics; standardized benchmarks (MEFB, SICE, PQA-MEF) and diversified evaluation are critical to avoid overfitting to a specific quality criterion (Zhang, 2020, Yang et al., 2023).

Research trends point toward real-time, high-fidelity fusion systems deployable on consumer devices and intelligent, possibly multi-modal guidance (e.g., text-driven, as in MTIF). There is ongoing integration with burst photography, efficient HDR reconstruction, and semantic-aware fusion for scene understanding applications.

7. Summary Table: Representative MEF Methodologies

Class	Method / Reference	Key Features / Innovations
Transform-domain (analytical)	Block-DCT (Ramakarishnan et al., 2021)	Non-parametric coefficient averaging; fast; high quality
Variational/Weighted Fusion	Entropy/CLAHE (Singh et al., 2022)	Data-fidelity & multiscale regularization
Data-driven, Deep Learning	TransMEF (Qu et al., 2021)	Self-supervised, transformer-CNN hybrid
Lookup-table, Efficient	MEFLUT (Jiang et al., 2023)	1D LUT with CNN/attention for real-time mobile fusion
Retinex-based Decomposition	Retinex-MEF (Bai et al., 10 Mar 2025)	Glare-aware, unsupervised, controllable exposure
Registration/Alignment (Dynamic)	OF+PatchMatch (Li et al., 2023)	Optical flow + PatchMatch, superpixel, ghost elimination
Implicit 3D LUT, UHD, Editable	Distilled LUT (Su et al., 2024)	Teacher-student, implicit MLP, arbitrary grid resolution

Each approach is characterized by a distinct set of mathematical tools, architectural design, and quality/efficiency trade-offs, reflecting the diversity and maturity of the MEF research landscape.