DeepPCB: PCB Defect Detection Dataset
- DeepPCB is a comprehensive dataset featuring 1,500 image pairs and full-board variants with six annotated defect categories for PCB defect detection.
- The dataset enables rigorous evaluation of detection, classification, and registration methods using metrics such as mAP and per-class precision.
- Advanced preprocessing and augmentation techniques, including template matching, global binarization, and random rotations, support robust model development.
DeepPCB is a comprehensive dataset designed for benchmarking and developing printed circuit board (PCB) defect detection algorithms, particularly in the context of deep learning-based methods. DeepPCB enables rigorous evaluation of object detection, classification, and registration methods on PCB imagery, featuring annotated defects and standardized experimental protocols. The dataset was introduced to address limitations in earlier PCB inspection datasets, such as lack of public availability, insufficient annotation precision, and synthetically limited defect diversity (Tang et al., 2019, Huang et al., 2019).
1. Dataset Composition and Statistics
DeepPCB comprises 1,500 image pairs, each consisting of a defect-free template image and a corresponding tested image of the same PCB, both at 640×640 pixel resolution. Defect annotations span six common fault categories: open, short, mousebite, spur, pin-hole, and spurious copper. Each tested image contains on average 3–12 defect instances distributed among these classes, resulting in approximately:
| Defect Type | Total Count (~) |
|---|---|
| Open | 3,200 |
| Short | 3,100 |
| Mousebite | 3,800 |
| Spur | 3,600 |
| Spurious copper | 3,900 |
| Pin-hole | 3,700 |
The dataset split consists of 1,000 image pairs for training and 500 for testing. No explicit validation split is included, but users may allocate 10–20% of training data for that purpose. Images are stored as “template” (defect-free) and “tested” (potentially defective) pairs, supporting comparative algorithms that operate on aligned templates and samples (Tang et al., 2019).
A related variant of DeepPCB described in (Huang et al., 2019) contains 1,386 full-board RGB images (4608×3456 px), with similar defect categories. Defects are simulated onto photographs of ten standard PCB templates, and images are annotated following the Pascal VOC XML schema.
2. Defect Taxonomy and Annotation Methodology
DeepPCB enumerates six defect types, captured via manual annotation and augmented through artificial “defect stamping” to ensure consistent sample numbers:
- Open: Interruption or absence in a copper trace.
- Short: Unintended electrical connection bridging two conductors.
- Mousebite: Notches or missing segments at trace borders.
- Spur: Small copper protrusions from traces.
- Pin-hole: Small circular omissions within conducted areas.
- Spurious copper: Isolated copper blobs not part of the intended design.
Annotations employ axis-aligned bounding boxes represented as , with integer class labels (1–6, background as 0). Templates are manually verified as defect-free and are aligned to tested samples by template matching to correct for planar misalignment. Defect regions are then marked; every annotation is verified by at least two human annotators. Illumination artifacts are suppressed by stringent thresholding and global binarization during annotation (Tang et al., 2019).
In the synthesized dataset version (Huang et al., 2019), orientation variation is introduced via random rotation (–180° to +180°), with per-image .txt files encoding ground-truth angles.
3. File Structure and Access Protocol
DeepPCB is distributed via GitHub (https://github.com/tangsanli5201/DeepPCB), with the following canonical structure:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
DeepPCB/
├─ images/
│ ├─ train/
│ │ ├─ 0001_template.png
│ │ ├─ 0001_tested.png
│ └─ test/
│ ├─ 1001_template.png
│ ├─ 1001_tested.png
└─ annotations/
├─ train/
│ ├─ 0001.txt
└─ test/
├─ 1001.txt |
Annotation files are line-delimited with the format:
1 |
class_id x_min y_min x_max y_max |
4. Benchmarking and Evaluation Protocols
DeepPCB supports evaluation using mean Average Precision (mAP) at Intersection-over-Union threshold , averaged across 6 defect classes:
where and is the average precision for class . Precision/recall F-measure is also reported.
Using the default VGG16-tiny backbone with Group Pyramid Pooling (GPP, max pooling), the proposed detector achieves:
- mAP: 98.6% @ 62 FPS
- Per-class AP: open 98.5%, short 98.5%, mousebite 99.1%, spur 98.2%, spurious copper 98.5%, pin-hole 99.4%
Comparative mAP results using the same test split:
| Method | mAP |
|---|---|
| Image Processing | 89.3% |
| YOLO | 92.6% |
| SSD | 95.9% |
| Faster R-CNN | 97.6% |
| Ours (avg pooling) | 97.1% |
| Ours (max pooling) | 98.6% |
The synthesized dataset (Huang et al., 2019) enables registration (rotation/affine estimation), detection (localization error ), and classification (per-class precision , AP) tasks. For reference-based methods (SURF + adaptive threshold + XOR + morphology), detection error rates are <0.2% for all defect classes. CNN-based classifiers achieve per-class AP 97% on test crops.
5. Preprocessing and Data Augmentation
DeepPCB employs multiple preprocessing and augmentation routines:
- Offline: Template matching for geometric alignment and global binarization for noise suppression.
- Online: Synchronized horizontal/vertical flips (probability 0.5) and random cropping to during minibatch training. No photometric distortions or color transformations are applied in standard benchmarks.
The synthesized set (Huang et al., 2019) includes orientation augmentation via random large-angle rotations.
6. Applications and Research Significance
DeepPCB is utilized primarily for the development and comparison of deep object detection architectures adapted for PCB fault inspection. Its dual-image input (template + tested) supports algorithms exploiting PCB repeatability and structural alignment for fine-grained defect localization. The high annotation density, object-level bounding boxes, artificial and real defects, and established benchmarks provide a reproducible experimental foundation. DeepPCB also supports registration and patch-level classification pipelines (Tang et al., 2019, Huang et al., 2019).
7. Limitations and Practical Considerations
No explicit validation split is specified in the canonical distribution of DeepPCB; empirical protocols may adopt cross-validation or carve-outs from training data for early stopping. Color augmentation, domain adaptation, and tile-based multi-scale testing are not benchmarked in published results. While the dataset introduces augmented defects for completeness, a plausible implication is that not all synthetic defects may fully capture manufacturing variability.
DeepPCB is publicly accessible and free for research use, but users should consult the hosting repository for any updates regarding versioning or usage guidelines. The dataset supports both detection and classification benchmarks using its annotation and evaluation protocols.
References:
(Tang et al., 2019) "Online PCB Defect Detector On A New PCB Defect Dataset" (Huang et al., 2019) "A PCB Dataset for Defects Detection and Classification"