Dense-Haze Dataset Benchmark

Updated 15 January 2026

Dense-Haze is a benchmark dataset for evaluating single-image dehazing under challenging dense, homogeneous haze conditions using real, radiometrically matched hazy/clear image pairs.
The dataset comprises 33 distinct outdoor scenes captured with a Sony A5000 at native 5456×3632 resolution in both RAW and JPEG formats under strictly controlled conditions.
It facilitates assessment of both classical and data-driven dehazing methods with evaluation metrics like PSNR, SSIM, and CIEDE2000, highlighting the inversion challenges in dense haze scenarios.

Dense-Haze is a benchmark image dataset specifically constructed for the quantitative evaluation of single-image dehazing algorithms, with a focus on challenging real-world dense and homogeneous haze conditions. Prior attempts to benchmark dehazing solutions were limited by the absence of real image pairs captured with and without haze, or by the predominance of artificially synthesized haze overlays. Dense-Haze directly addresses this gap by introducing high-resolution, radiometrically matched hazy/clear image pairs from real outdoor scenes where haze is generated using professional-grade fog machines. The dataset has become central in validating both classical and data-driven dehazing pipelines, exposing their performance boundaries in regimes where the inversion problem is most ill-posed (Ancuti et al., 2019).

1. Dataset Composition and Scene Characteristics

Dense-Haze consists of 33 unique outdoor scenes, each recorded in two conditions: one under clear atmospheric conditions and one under dense, spatially homogeneous haze. Every capture uses a Sony A5000 camera at native 5456×3632 resolution, with storage in both RAW (ARW) and JPEG formats at 24-bit depth. Scene selection prioritizes urban facades, complex vegetative structures, roadways, distant hills, and architectural details, ensuring broad coverage of depth (up to 20–30 m) and texture. Every pair of hazy/clear scenes is taken under rigorously matched environmental parameters and exposure settings to preserve pixel-level correspondence between the ground truth and hazy image. Dense-Haze uniquely targets dense haze, characterized by a nearly uniform scattering layer that severely attenuates contrast even for objects at close range. Haze optical thickness was standardized through harmonized fog machine operation rather than formal metric bins; thus, haze level is consistent but not numerically specified across samples (Ancuti et al., 2019).

2. Acquisition Protocol and Ground Truth Alignment

The acquisition protocol is designed for radiometric and geometric precision. A fixed tripod mount and remote triggering prevent alignment drift, supported by identical manual settings for aperture, shutter speed, and ISO across all captures of a scene. Custom white balance is established with an 18% gray card and validated with a Macbeth color checker, which appears in every frame. Professional LSM1500-PRO haze machines, each rated at 1,500 W, generate dense vapor with particle diameters of 1–10 μm using a high-density haze fluid, producing haze that is optically thick throughout the 20–30 m scene depth. The density is modulated by machine runtime (2–3 minutes followed by a settling period). Image pairs are captured under overcast skies around sunrise or sunset with wind velocities below 2–3 km/h, preventing lighting and haze movement artifacts. The pixel-level alignment and radiometric integrity of each hazy/clear pair is validated via the embedded color checker. The haze-free image serves as per-pixel ground truth for all supervised evaluations (Ancuti et al., 2019).

3. Underlying Physical Model and Benchmark Motivation

Dense-Haze is underpinned by the Koschmieder law for atmospheric scattering, which models the observed image as: $I(x) = J(x)\, t(x) + A\, [1 - t(x)], \quad t(x) = e^{-\beta d(x)}$ where $I(x)$ is the hazy image, $J(x)$ the clear scene radiance, $A$ the atmospheric light, $t(x)$ the transmission map, $\beta$ the scattering coefficient, and $d(x)$ the depth. While Dense-Haze does not provide explicit per-pixel depth or $A$ maps, the scene setup realizes a regime with uniformly low $t(x)$ —the regime where single-image inversion is most ill-posed and model failure is most probable. This differentiates Dense-Haze from prior benchmarks where haze is either light or synthetically layered, and where transmission estimation remains tractable (Ancuti et al., 2019).

4. Evaluation Protocol, Metrics, and Baseline Methods

No fixed dataset split is enforced, but a recommended protocol for machine learning pipelines is a 20/6/7 split for training/validation/test. The primary evaluation metrics are Peak Signal-to-Noise Ratio (PSNR), Structural Similarity Index (SSIM), and CIEDE2000 color-difference, each computed against the non-hazy ground truth. In baseline evaluation, seven representative dehazing methods spanning dark channel prior (DCP), color-line, non-local color cluster, fusion-based, and convolutional neural network (CNN) variants are assessed. Even state-of-the-art models (e.g., DCP) attain only ~15–16 dB PSNR, SSIM <0.80, and CIEDE2000 >20 in many cases. Qualitative failures are substantial: color shifts in bright haze regions, persistent white-haze halos in CNN outputs, and boundary artifacts near high-contrast edges. These findings evidence the disproportionate difficulty posed by dense, homogeneous haze as compared to lighter or synthetic scenarios (Ancuti et al., 2019).

5. Usage Recommendations and Identified Limitations

Practitioners are advised to process the provided RAW files and color charts for bespoke white-balance and radiometric calibration, and to include both objective (PSNR, SSIM) and perceptual metrics or user studies in evaluations due to the occurrence of non-obvious color and structure artifacts in dehazed outputs. For deep learning workflows, fine-tuning on Dense-Haze or employing repeated cross-validation on its 33 real scenes is recommended for domain adaptation; however, the sample size remains modest for data-intensive models, increasing overfitting risk. Dense-Haze is limited to static, daytime, outdoor scenes with uniform dense haze, omitting dynamic content, indoor settings, night conditions, and lacking ground-truth depth maps or atmospheric-light annotations. Future benchmarks are suggested to incorporate LiDAR or other depth modalities, more diverse scene types (including indoor and nighttime), and multiple haze-density levels (Ancuti et al., 2019).

6. Comparative Context and Impact on Algorithm Development

Dense-Haze sets a unique standard for real-world, dense and homogeneous haze benchmarking. In subsequent research, such as “Feature Forwarding for Efficient Single Image Dehazing” (Morales et al., 2019), Dense-Haze is used to reveal that CNN models trained on limited dense-haze samples are vulnerable to overfitting and can produce hallucinated detail in irrecoverable haze regions. Incorporating priors from atmospheric scattering models (e.g., as in DualFastNet) and leveraging architectural features such as pyramid pooling are proposed as mitigation strategies. Subsequent datasets, such as IMFD (Cetinkaya et al., 2023), have addressed some recognized limitations of Dense-Haze by introducing controlled multi-level haze and annotated haze levels, but Dense-Haze continues to represent the canonical testbed for the single-image, outdoor, dense-haze regime.

7. Prospects for Future Benchmarking

Future expansions are identified as necessary for comprehensive benchmarking: inclusion of per-pixel transmission or depth maps; extension to dynamic scenes, mixed meteorological conditions, and indoor or nighttime imagery; and an increase in scene and image count to support robust data-driven method development. Active sensing (e.g., LiDAR), polarization, and multi-spectral imaging are recommended directions. Broader and denser benchmarks, building on the standards established by Dense-Haze, are likely to sharpen the evaluation and advance the development of generalizable single-image dehazing algorithms (Ancuti et al., 2019).

Markdown Report Issue Upgrade to Chat

References (3)

Dense Haze: A benchmark for image dehazing with dense-haze and haze-free images (2019)

Feature Forwarding for Efficient Single Image Dehazing (2019)

A New Multi-Level Hazy Image and Video Dataset for Benchmark of Dehazing Methods (2023)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Dense-Haze Dataset.

Dense-Haze Dataset Benchmark

1. Dataset Composition and Scene Characteristics

2. Acquisition Protocol and Ground Truth Alignment

3. Underlying Physical Model and Benchmark Motivation

4. Evaluation Protocol, Metrics, and Baseline Methods

5. Usage Recommendations and Identified Limitations

6. Comparative Context and Impact on Algorithm Development

7. Prospects for Future Benchmarking

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Dense-Haze Dataset Benchmark

1. Dataset Composition and Scene Characteristics

2. Acquisition Protocol and Ground Truth Alignment

3. Underlying Physical Model and Benchmark Motivation

4. Evaluation Protocol, Metrics, and Baseline Methods

5. Usage Recommendations and Identified Limitations

6. Comparative Context and Impact on Algorithm Development

7. Prospects for Future Benchmarking

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research