CUHK-CQRC High-Density Color QR Code Dataset

Updated 19 January 2026

CUHK-CQRC dataset is a comprehensive collection of high-density color QR codes captured under diverse real-world conditions to systematically evaluate decoding methods.
The dataset comprises high-resolution still photos and live-view frames featuring multiple capture settings, devices, and precise ground truth annotations.
Benchmark evaluations reveal that advanced HiQ decoders significantly reduce bit error rates and improve decoding success compared to baseline PCCC approaches under challenging distortions.

The CUHK-CQRC dataset is a large-scale, publicly available benchmark of high-density color QR codes, designed to enable rigorous testing of decoding algorithms under realistic conditions. This collection supports the comparative evaluation of decoders subject to chromatic distortions such as cross-channel interference, illumination variation, and cross-module interference, as well as geometric noise. Developed to address the absence of testbeds for high-capacity color QR codes, CUHK-CQRC enables unbiased, reproducible assessment of methods including the baseline PCCC decoder and the HiQ pipeline with advanced chromatic distortion modeling (Yang et al., 2017).

1. Motivation and Benchmarking Goals

The primary motivation for the CUHK-CQRC dataset stems from the lack of any large-scale, publicly accessible collection of high-density color QR codes acquired under real-world capture conditions. It was conceived as a resource to systematically expose major chromatic distortion effects relevant to color QR code decoding:

Cross-channel interference (CCI): signal contamination across RGB channels, typical in printed color codes.
Illumination variation: shifts in chromatic appearance due to lighting diversity.
Cross-module interference (CMI): a newly identified artifact in dense QR codes wherein color bleeding occurs between adjacent modules.

The dataset is designed to support fair comparisons between existing color-QR decoders (notably PCCC [Blasiński et al. 2013]) and the HiQ decoding framework, targeting reliable, fast decoding for mobile use. The benchmark tests performance under conditions that simultaneously manifest CCI, CMI, and illumination shifts, providing an unbiased ground for method evaluation.

2. Dataset Composition and Specifications

CUHK-CQRC comprises 5,390 color QR code samples split into two main capture modalities:

Still photographs (“photos”): 1,506 RGB-JPEG images, each approximately 2,000×2,000 pixels (8 MP).
Live-view frames (“previews”): 3,884 native YUV_420 (4:2:0) camera frames (~1,200×1,200 px, Android/iOS).

All samples are 3-layer HiQ codes utilizing 8-color symbols, printed at two resolutions (600 dpi, 1,200 dpi) and at four physical sizes (30, 40, 50, 60 mm per side). Five distinct user content sizes (pre-ECC) are included:

2,787 B
3,819 B
5,196 B
6,909 B
8,859 B

The matrix sizes (QR "versions") used are:

Version 27 (125×125 modules)—for ~2.8 kB and ~3.8 kB payloads
Version 35 (157×157)—for ~5.2 kB and ~6.9 kB
Version 40 (177×177)—for ~8.9 kB

These combinations yield module densities from 15,625 up to 31,329 per sample.

3. Acquisition Protocol and Diversity

Data were collected using eight popular smartphone models, balancing sensor characteristics and capture environments:

Device	Sensor Resolution	IS Type
iPhone 6 Plus	8 MP	Optical
iPhone 6	8 MP	Digital
Nexus 4	8 MP	-
Meizu MX2	8 MP	-
OnePlus 1	13 MP	-
Galaxy Nexus 3	5 MP	-
Xperia M2	8 MP	-
Nexus 5	8 MP	Optical

Native autofocus and default auto-exposure/white-balance settings were used. Environmental conditions were engineered to reflect indoor daylight fluorescent (~400 lx), incandescent (~300 lx), outdoor sunny (~10,000 lx), outdoor cloudy (~5,000 lx), uniform shadow (~100 lx), and spotty/non-uniform shadows. Geometric diversity was induced via angle (±30° pitch/yaw), capture distance (20–50 cm), and background control (plain surfaces). For each sample, permutations of print size, dpi, QR content, layer-color mapping, device, lighting, and image mode were applied to maximize variability.

4. Annotation, Ground Truth, and Distortion Characteristics

For every instance, the exact bit-sequence is available (as produced by the generator). Pixel–module correspondences were extracted using a human-assisted procedure: four finder corners are manually clicked, followed by automatic homography and per-module central pixel sampling. This process generated a training set of roughly 600,000 color–codeword pairs. All artifacts, including CCI, illumination shifts, CMI from ink bleed, blur, and camera noise, are intrinsic—no synthetic augmentations were introduced.

Statistical breakdown:

Illumination: 24% indoor fluorescent, 18% incandescent, 22% outdoor sunny, 19% outdoor cloudy, 17% shadowed.
Chromatic interference: All samples display pronounced CCI + CMI; substantial RGB-space cluster overlap is visible, especially under fluorescent and incandescent lighting.

5. Evaluation Metrics and Performance

Two standard metrics are used to report performance:

Decoding Success Rate (DSR):

$\mathrm{DSR} = \frac{\text{\# codes successfully decoded after ECC}}{\text{total codes tested}}$

Bit Error Rate (BER):

$\mathrm{BER} = \frac{1}{N\,L} \sum_{i=1}^{N} \left\| \mathbf{b}_i^{\text{decoded}} - \mathbf{b}_i^{\text{ground-truth}} \right\|_1$

where $N$ is the number of QR samples and $L$ is the data bit length per sample.

Key comparative results (Setting 1, full pixel recovery):

PCCC baseline: mean BER ≈ 10.7%; DFR = 84%
PCCC + robust geometric transform (RGT): BER ≈ 8.8% (–18% rel.); DFR = 72% (–12 pp)
HiQ (QDA): BER ≈ 4.3% (–60% rel.); DFR = 54% (+188% relative success)
HiQ (LSVM): BER ≈ 4.0%; DFR = 65%

CMI-cancellation (Setting 2, center pixel only):

QDA → QDA-CMI: BER 3.96%→3.70% (–6.8% relative)
LSVM → LSVM-CMI: BER 4.30%→3.58% (–16.8% relative)
LSVM DFR: 65%→56% (–9 pp)

With HiQ (LSVM-CMI) on iPhone 6 Plus, mean decoding time is ~234 ms/frame; one successful scan for a 3.8 kB (35×35 mm) code averages ≈264 ms total.

Footprint vs. capacity with HiQ decoder:

2,900 B in 26×26 mm²
7,700 B in 38×38 mm²
8,900 B in 42×42 mm²

All decoded in under 3 seconds on commodity smartphones.

6. Dataset Structure, Distribution, and Licensing

CUHK-CQRC is hosted at http://www.authpaper.net/colorDatabase/index.html with the following structure:

/photos/: JPEG still images (one per photo)
/previews/: YUV_420 video frames (one per preview)
metadata.csv: per-sample metadata including sample_id, layer, version, print_size_mm, dpi, lighting, device, file path, content bytes, ground truth hex string

License: Creative-Commons Attribution–NonCommercial–ShareAlike 4.0 (CC-BY-NC-SA).

7. Impact, Key Findings, and Research Directions

CUHK-CQRC is recognized as the first large-scale, public, heterogeneous dataset to stress-test color QR decoders against all principal forms of chromatic distortion and geometric noise. Comparative analyses have revealed major limitations in prior PCCC approaches, particularly under high-density, real-world capture conditions. The HiQ pipeline, leveraging QDA/LSVM and CMI cancelation, substantiates robust, fast decoding performance, improving decoding success by 188% and reducing BER by over 60%.

A plausible implication is the potential for further development of color QR decoding algorithms optimized for mobile deployment, especially by addressing chromatic interference and geometric distortions endemic to practical scenarios. The dataset supports continued method refinement and reproducible benchmarking in high-capacity, portable data encoding applications (Yang et al., 2017).

Markdown Report Issue Upgrade to Chat

References (1)

Robust and Fast Decoding of High-Capacity Color QR Codes for Mobile Applications (2017)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to CUHK-CQRC Dataset.