CrashSplat: 3D Vehicle Damage Assessment
- CrashSplat is an end-to-end pipeline that uplifts 2D damage masks into 3D Gaussian Splatting representations, enabling geometrically accurate damage localization.
- It integrates 2D instance segmentation, Structure-from-Motion, and a learning-free 3D segmentation algorithm to achieve fast, view-consistent damage assessment.
- The system demonstrates competitive segmentation performance while offering sub-second, CPU-only inference for practical vehicle damage analysis.
CrashSplat is an end-to-end pipeline for vehicle damage assessment that performs 3D segmentation of damages by uplifting 2D masks to 3D Gaussian Splatting (3D-GS) representations. It is specifically designed to overcome limitations inherent in existing 2D approaches, notably the inability to provide geometrically faithful localization, cross-view consistency, and robustness to damages visible only in a single view. CrashSplat features an automatic pipeline composed of four principal stages: per-frame 2D instance segmentation, Structure-from-Motion (SfM) to estimate camera geometry, 3D Gaussian Splatting reconstruction, and a learning-free, single-view segmentation algorithm for identifying damage-affected 3D Gaussians even when only weak single-view supervision is available (Chileban et al., 28 Sep 2025).
1. Pipeline Structure and Motivation
Traditional image-based damage detection with instance segmentation networks (e.g., Mask-R-CNN, YOLO) yields fast 2D masks per image but cannot resolve geometric ambiguities or quantify depth, area, or size of damages. Furthermore, 2D approaches are limited in handling scenarios where damage—such as a fine scratch or small dent—is visible solely in a specific camera pose. CrashSplat addresses these limitations by uplifting 2D masks to 3D within a Gaussian Splatting framework, enabling:
- Geometrically accurate localization and quantification of damage.
- View-consistent visualization and analysis, including novel view synthesis.
- Reliable segmentation when the target damage is only apparent in a single view.
The core stages of the pipeline are:
- 2D damage instance segmentation.
- Sparse 3D reconstruction and camera parameter extraction via COLMAP (SfM).
- Per-scene 3D Gaussian Splatting (3D-GS) modeling.
- Single-view 3D segmentation by projecting Gaussians, conducting Z-buffer filtering and statistical cleaning to isolate damage Gaussians associated with a selected 2D mask.
2. 2D Damage Mask Generation
CrashSplat utilizes a YOLOv11 segmentation network (Ultralytics, “l” lightweight variant) to produce 2D damage masks on each RGB frame. The model is trained using CarDD (4,000 images, 6 classes) and VehiDE (13,945 images, 8 classes) datasets. Inference throughput is sub-100 ms per 640×640 image on Apple M3 Pro hardware.
Reported instance segmentation performance:
- CarDD: mAP@50 = 75.8, mAP@50:95 = 58.5
- VehiDE: mAP@50 = 51.2, mAP@50:95 = 29.4
These 2D segmentation masks serve as the initial supervisory signal for damage localization and initiate the 2D-to-3D uplift protocol.
3. Structure-from-Motion Preprocessing
For geometric calibration, CrashSplat applies COLMAP (Schonberger & Frahm 2016). The process yields camera intrinsics,
and extrinsics, consisting of rotation matrix and translation . The standard pinhole transformation is employed: with explicit mapping between world and camera frame for projecting 3D Gaussians.
4. 3D Gaussian Splatting Representation
CrashSplat leverages 3D Gaussian Splatting to encode the reconstructed scene as a set of Gaussians characterized by:
- Mean position
- 3×3 covariance (ellipsoid structure)
- RGB color
- Opacity
The spatial density is
Rasterization, as per Kerbl et al. (2023), is performed by:
- Projecting Gaussians to image tiles (16×16 pixels)
- Sorting Gaussians by (front-to-back)
- Opacity accumulation:
- Color compositing (backwards pass):
This representation supports highly efficient view-synthesis and facilitates precise mask lifting from 2D to 3D.
5. Single-View 3D-GS Segmentation Algorithm
CrashSplat’s learning-free segmentation algorithm proceeds as follows:
a) Select one input frame and retrieve its 2D damage mask. b) Project the center of each 3D Gaussian onto the frame using the pinhole model; discard Gaussians outside the 2D mask polygon. c) Z-buffer filtering:
- Sort Gaussians by ascending depth .
- For each accepted Gaussian, the per-pixel contribution is weighted by:
where Mahalanobis distance , and with
- A dynamic threshold is set as the mean opacity of accepted Gaussians.
d) Statistical cleaning: Discard any Gaussian with or . e) Hole-filling: Re-add Gaussians with within of the mean mask depth, even if not selected by Z-buffering.
This scheme is robust to situations where damage is only evident in a single pose and multi-view consistency cannot be enforced.
6. Experimental Evaluation
CrashSplat was evaluated on both proprietary self-recorded vehicle damage sequences (scratches, flat tire, broken lamp) and public SPIn-NeRF scenes. The primary evaluation metrics were Intersection-over-Union (IoU), F1 score, pixel-accuracy, and runtime per instance.
Self-recorded quantitative results:
| Object | IoU (input/mean 3 views) | F1 (input/mean 3 views) | Accuracy (input/mean 3 views) | Time (s) |
|---|---|---|---|---|
| Scratch | 65.7 / 52.4% | 79.3 / 67.8% | 99.30 / 99.28% | 0.04 |
| Flat tire | 88.2 / 87.2% | 93.7 / 93.1% | 99.01 / 99.01% | 0.12 |
| Broken lamp | 82.4 / 67.1% | 90.4 / 78.8% | 97.93 / 97.70% | 0.31 |
Public SPIn-NeRF scene results (mean over 6 scenes):
| Method | IoU (%) | Accuracy (%) |
|---|---|---|
| Single-View (Cen et al 2025) | 81.1 | 97.1 |
| MVSeg (Mirazaei et al 2023) | 90.9 | 99.1 |
| SA3D (Cen et al IJCV 2025) | 93.1 | 99.0 |
| SAGD (Hu et al 2024) | 91.1 | 98.8 |
| CrashSplat (1 mask) | 79.9 | 96.9 |
Qualitative assessments (e.g., projected back scratches, wheel dents, lamp cracks) showed precise alignment with ground-truth masks. Visual comparison on “Truck” scenes indicates that performance is competitive with multi-view methods using only a single 2D mask.
7. Strengths, Limitations, and Future Directions
CrashSplat’s design enables true single-view 3D damage segmentation without requiring multiple annotated masks or scene-specific learning. The pipeline features a sub-second, CPU-only per-instance runtime and robust handling of fine or view-specific damages. Its learning-free 3D step eliminates the need for foundation models such as SAM.
Principal limitations include:
- Dependence on the accuracy of the upstream 2D detector; false positives/negatives in YOLO masks (e.g., from shadows or reflections) propagate to 3D. Augmentation with additional synthetic data may improve robustness.
- Sensitivity to the quality of Gaussian Splatting reconstructions. Low-opacity Gaussians on glass or reflective surfaces may eliminate true damage evidence.
- The evaluation protocol relies on back-projected 2D masks due to the absence of real 3D damage ground truth. Construction of a dedicated 3D damage dataset remains an open problem.
- The segmentation algorithm is currently single-threaded. Future work could focus on tiling strategies, multi-threading, or GPU acceleration to reduce latency further.
CrashSplat establishes a principled method for automating 3D vehicle damage segmentation from casual video captures, combining off-the-shelf 2D networks, established SfM methods, advanced 3D-GS modeling, and a lightweight per-view filtering approach for single-view supervision (Chileban et al., 28 Sep 2025).