Retinex-Diffusion: Restoration & Synthesis

Updated 20 January 2026

Retinex-Diffusion is a framework that combines physical image decomposition into reflectance and illumination with diffusion models for effective restoration and synthesis.
It employs transformer-based decomposition and latent space optimization with energy-based guidance to precisely control lighting, texture, and color.
Empirical results show state-of-the-art performance with improved metrics like PSNR and SSIM, enhancing low-light and extreme illumination imaging tasks.

Retinex-Diffusion is a term denoting a diverse family of methodologies that synthesize the physical decomposition of images into reflectance and illumination—the Retinex theory—with the powerful generative and restorative mechanisms of diffusion models. This integration enables explicit manipulation and restoration of lighting conditions, texture, and color in both low-light enhancement and conditional image synthesis tasks. Recent approaches range from physics-guided energy injection into unconditional DDPM sampling (Xing et al., 2024), through latent-space Retinex decomposition for unsupervised restoration (Jiang et al., 2024), to transformer-guided deep priors and compact latent diffusion (He et al., 2023, Yi et al., 2023), and PDE-based anisotropic diffusion for controlled illumination correction (Nnolim, 2017). These frameworks jointly address the ill-posed factorization of image formation, the challenge of generating plausible scene content in extreme lighting, and the need for interpretable, geometry-aware visual manipulation.

1. Retinex Theory and Image Formation

The Retinex theory assumes that an observed image $I$ is the element-wise product of a reflectance $R$ (intrinsic albedo, illumination-invariant) and an illumination $L$ or $S$ (locally smooth, variant with lighting):

$I(x) = R(x) \odot L(x)$

In the context of contemporary Retinex-Diffusion, this decomposition underpins both restoration (extracting clean scene content from degraded input) and controllable synthesis (manipulating shadows, highlights, and overall lighting) (Yi et al., 2023, Xing et al., 2024). For physically motivated control, the illumination and reflectance fields are handled separately, enabling fine-grained adjustment of image properties.

2. Diffusion Models for Restoration and Generation

Diffusion models iteratively transform noise samples into structured outputs through a stochastic process defined by a forward SDE (injecting noise) and a learned reverse SDE (denoising via a score network $\epsilon_\theta$ ):

Forward diffusion:

$q(I_t \mid I_0) = \mathcal N(I_t; \sqrt{\bar\alpha_t}I_0, (1-\bar\alpha_t)\mathbf I)$

Reverse diffusion:

$p_\theta(I_{t-1} \mid I_t, I_c) = \mathcal N(I_{t-1}; \mu_\theta(I_t, I_c, t), \sigma_t^2 \mathbf I)$

The denoising is conditioned on Retinex priors, either extracted directly or learned, and may be performed in latent spaces for computational efficiency (He et al., 2023, Jiang et al., 2024). Guidance energy based on illumination/shading and reflectance is injected during the sampling step for direct manipulation of lighting properties (Xing et al., 2024).

3. Retinex-Based Conditioning and Decomposition Networks

A key innovation is the fusion of transformer-based or convolutional decomposition networks with diffusion. These networks decompose input images into reflectance and illumination maps using attention mechanisms (e.g., multi-head depth-wise attention in the Retinex Transformer Decomposition Network, TDN (Yi et al., 2023)), latent encoding (content-transfer decomposition in CTDN (Jiang et al., 2024)), or learned priors (He et al., 2023).

Example: For paired inputs $(I_l, I_n)$ , TDN optimizes

$L_{\rm decomp} = L_{rec} + \gamma_{rc} L_{rc} + \gamma_{sm} L_{smooth}$

with explicit objectives for reconstruction, illumination smoothness, and reflectance consistency.

In latent Retinex-Diffusion, the decomposition and conditioning occur within a feature space rather than pixel space, allowing unsupervised and generalizable restoration (Jiang et al., 2024, He et al., 2023).

4. Physical Control and Guidance in Diffusion Sampling

Retinex-Diffusion frameworks implement explicit energy-based guidance to steer the generative process toward desired illumination states:

Energy decomposition:

$E(x_t; y, t) = \lambda_I E_I(f_s(x_t), y_s) + \lambda_R E_R(f_c(x_t), y_c)$

where $E_I$ matches the estimated illumination to user prompts (Gaussian mixtures modeling lighting), and $E_R$ enforces reflectance consistency using cross-color ratios (Xing et al., 2024).

This approach enables training-free relighting, shadow synthesis, and geometric fidelity in generated and real-image editing, bypassing the need for latent direction searches or extensive retraining.

5. Training, Inference, and Implementation

Stage-wise Training: Separates decomposition network optimization (for content, reflectance, and illumination extraction) from diffusion model training (noise prediction, consistency regularization) (Yi et al., 2023, Jiang et al., 2024, He et al., 2023).
Latent Space Diffusion: Utilizes compact latent representations for both reflectance and illumination, decreasing computational burden and mitigating pixel misalignment issues (He et al., 2023).
PDE-based Retinex-Diffusion: Employs anisotropic diffusion equations, entropy-guided stopping, and processing in the HSI/HSV color-space for robust and fully automated illumination correction (Nnolim, 2017).

Typical architectures include UNet or Restormer backbones for denoising, attention-driven multi-path generative networks, and transformer-based refinement cascades for feature consistency.

6. Quantitative and Qualitative Performance

Retinex-Diffusion models consistently outperform previous state-of-the-art both on paired and unpaired low-light and degraded-illumination benchmarks, with metrics such as PSNR, SSIM, FID, LPIPS, BIQI, LOE, PI, NIQE, UCIQE, and UIQM:

Method / Model	PSNR (LOL)	SSIM (LOL)	FID (LOL)	LPIPS (LOL)
Diff-Retinex (Yi et al., 2023)	21.98	0.863	47.85	0.048
Reti-Diff (He et al., 2023)	25.35	0.866	--	--
LightenDiffusion (Jiang et al., 2024)	20.45	0.803	--	0.192
JoReS-Diff (Wu et al., 2023)	27.63	0.884	42.99	0.090

Retinex-Diffusion approaches uniquely reconstruct plausible textures in extreme low-light (e.g., "hallucinated" details in underexposed areas), achieve natural color correction, and generate realistic shadow and shading gradients during synthesis (Yi et al., 2023, Xing et al., 2024).

7. Limitations, Extensions, and Future Directions

Dependence on pre-trained diffusion model's data distribution limits achievable material/lighting realism in cases absent from training (Xing et al., 2024).
Accuracy of Retinex extraction suffers with highly specular or colored lighting due to Gaussian assumptions.
DDIM inversion and latent representation may lose fine details; residual blurring possible under severe degradation (He et al., 2023).
Retinex-Diffusion pipelines typically do not leverage frequency-domain or multimodal priors (e.g., infrared).
Promising extensions include hybrid intrinsic extraction, frequency-domain diffusion, multimodal input, video consistency, and joint learning of material and illumination priors.

Retinex-Diffusion thus constitutes a rigorously validated, physically interpretable, and algorithmically flexible framework for illumination correction, conditional lighting control, and generative restoration in image processing and computer vision (Yi et al., 2023, Xing et al., 2024, Jiang et al., 2024, He et al., 2023, Wu et al., 2023, Nnolim, 2017).