Papers
Topics
Authors
Recent
Search
2000 character limit reached

Car Damage Dataset (CDD) Overview

Updated 18 February 2026
  • Car Damage Dataset (CDD) is a curated image dataset with precise annotations for various damage types and vehicle parts, supporting computer vision research.
  • Methodologies across CDD variants include bounding boxes, polygon masks, and synthetic augmentation, enabling benchmark testing for detection and segmentation tasks.
  • Applications of CDD span automated insurance claims, fraud prevention, and fine-grained damage segmentation, driving improvements in model robustness and domain adaptation.

A Car Damage Dataset (CDD) is a curated image dataset with ground-truth annotations for various types of visual damage, defects, and vehicle parts captured under real-world or simulated conditions. Such datasets are critical for developing, benchmarking, and validating computer vision models for automated vehicular damage assessment, insurance claim automation, fraud detection, and related tasks. Multiple datasets bearing the name or abbreviation "CDD" have been published since 2018, exhibiting significant variation in scope, annotation granularity, and benchmarking protocols. The following sections synthesize the defining characteristics, methodologies, and research applications of these datasets as reported in peer-reviewed preprints and dataset documentation.

1. Dataset Scope, Structure, and Variants

Car Damage Datasets have been released with variable scope, damage taxonomy, and annotation protocols, reflecting evolving goals in damage identification, fraud claims, and automated assessment.

  • The first CDD, released in (Li et al., 2018), comprises ≈2 170 images: 1 790 web-scraped images of single damages and ≈380 images from 92 distinct vehicles in a parking-lot, each imaged multiple times for claim/fraud scenarios. Damage is categorized as scratch, dent, or crack with bounding-box (⟨xmin, ymin, xmax, ymax⟩) annotations, no segmentation masks.
  • A more recent variant, named CDD in the context of (Panboonyuen, 12 Jun 2025), includes 12 000 high-resolution images annotated with 26 real damage types, 7 fake-damage types, and 61 vehicle part classes. This dataset uses instance-level polygon segmentation (COCO-style JSON) with mask and bounding box per object or region and is partnered with an extensive annotation protocol.
  • Other published datasets (e.g., the "CDD" in (Baig et al., 21 Aug 2025)) focus on narrow domains such as minor dent detection, with 2 241 images, single-class bounding-box annotations, and highly controlled acquisition protocols.
  • Reference datasets for comparison, such as CarDD (Wang et al., 2022) and CrashCar101 (Parslov et al., 2023), cover 4 000–101 050 images, broader class sets (up to 6 real types in CarDD, 5 in CrashCar101), and polygon-level segmentation (CarDD) or synthetic, pixel-accurate masks (CrashCar101).
Dataset (citation) Size Damage Classes Annotation Type Modes/Protocols
CDD (Li et al., 2018) ≈2 170 3 (scratch, dent, crack) Bounding box Real/fraud protocols
CDD (Panboonyuen, 12 Jun 2025) 12 000 26 real, 7 fake, 61 parts Polygon masks (COCO) Multi-task, multi-label
CDD (Baig et al., 21 Aug 2025) 2 241 1 (dent) Bounding box (YOLOv8) Single-class
CarDD (Wang et al., 2022) 4 000 6 Polygon (COCO/SOD) Detection/segmentation
CrashCar101 (Parslov et al., 2023) 101 050 5 Pixel segmentation Synthetic, domain random.

2. Damage Taxonomies and Annotation Schemes

The definition of damage classes and annotation detail is a key differentiator among datasets.

This suggests that advances in annotation complexity (polygons, part-damage links, fake/real distinction) reflect rising ambitions for multi-task models and practical deployment robustness.

3. Data Acquisition, Environments, and Augmentation

Acquisition protocols and dataset diversity directly influence model generalization:

Dataset heterogeneity in environmental context and acquisition (real vs. synthetic) is a recognized driver of domain generalization for segmentation and detection models.

4. Benchmarking Protocols and Evaluation Metrics

Evaluation metrics and benchmarking tasks are adapted to detection, segmentation, and retrieval-focused use cases:

  • Detection and Segmentation: Standard object detection and instance segmentation metrics are used: Intersection-over-Union (IoU), precision/recall (Precision=TPTP+FP\text{Precision} = \frac{TP}{TP+FP}), F1-score, mean Average Precision (mAP), and mask AP (e.g., COCO-style, (Wang et al., 2022, Panboonyuen, 12 Jun 2025)).
  • Fraud Detection/Retrieval: Older protocols (Li et al., 2018) apply nearest-neighbor matching with deep feature fusion (VGG-16, histograms) and rank-based retrieval accuracy (Rank-1, Rank-10).
  • Small-object handling: Segmentation benchmarks distinguish AP_S (for small objects), showing significant gains from enhancements such as multi-scale augmentation and focal loss (Wang et al., 2022).
  • Few-shot Sim2Real: Synthetic datasets (Parslov et al., 2023) demonstrate improved mIoU under few-shot fine-tuning, quantifying domain adaptation efficacy.
  • Multi-task Learning: Accuracy, precision, recall, and F1-score per class/task are published for both damage and part segmentation (e.g., ALBERT-V9D/V9P in (Panboonyuen, 12 Jun 2025)).
Task Metric Cited Example
Damage Detection Precision, Recall, [email protected] (Baig et al., 21 Aug 2025, Wang et al., 2022)
Instance Segmentation AP_mask, AP_50, AP_S (Wang et al., 2022, Panboonyuen, 12 Jun 2025)
Sim2Real Transfer (few-shot) mIoU (Parslov et al., 2023)
Fraudulent Claim Retrieval Rank-1, Rank-10 Accuracy (Li et al., 2018)
Part Segmentation mIoU, Accuracy, F1 (Parslov et al., 2023, Panboonyuen, 12 Jun 2025)

5. Applications and Model Benchmarking

CDD variants support a range of computer vision tasks fundamental to automotive inspection:

Baseline results are available for each variant. For example, on (Panboonyuen, 12 Jun 2025), the ALBERT-V9D model achieves 0.9472 accuracy and 0.8926 F1 on 26-class damage classification; Mask R-CNN, DCN, and DCN⁺ yield AP_mask of 49.4, 52.5, and 57.0, respectively, on CarDD (Wang et al., 2022).

6. Data Access, Licensing, and Limitations

Data accessibility and use restrictions vary substantially:

Identified limitations include constrained class coverage (single-class in some variants), absence of segmentation masks or stereo data, modest size relative to future needs, and persistent domain gaps between synthetic and real-world images. Recommendations across recent literature emphasize multi-class, multi-modality (mask, bounding box, keypoint), fake-damage realism, and improved environmental diversity for robust insurance, autonomous, and fleet applications.

7. Comparative Context and Future Directions

CDD and its analogues are part of an expanding corpus of car damage datasets that enable comprehensive benchmarking and new methodologies in vision-based automotive inspection.

Key trends:

Researchers are encouraged to select datasets and protocols aligned with task requirements—fraud detection, fine-grained segmentation, sim2real adaptation—and to reference the original dataset sources for detailed annotation schema, licensing, and access conditions (Li et al., 2018, Panboonyuen, 12 Jun 2025, Wang et al., 2022, Parslov et al., 2023, Baig et al., 21 Aug 2025).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Car Damage Dataset (CDD).