Papers
Topics
Authors
Recent
Search
2000 character limit reached

CrashSplat: 3D Vehicle Damage Assessment

Updated 18 February 2026
  • CrashSplat is an end-to-end pipeline that uplifts 2D damage masks into 3D Gaussian Splatting representations, enabling geometrically accurate damage localization.
  • It integrates 2D instance segmentation, Structure-from-Motion, and a learning-free 3D segmentation algorithm to achieve fast, view-consistent damage assessment.
  • The system demonstrates competitive segmentation performance while offering sub-second, CPU-only inference for practical vehicle damage analysis.

CrashSplat is an end-to-end pipeline for vehicle damage assessment that performs 3D segmentation of damages by uplifting 2D masks to 3D Gaussian Splatting (3D-GS) representations. It is specifically designed to overcome limitations inherent in existing 2D approaches, notably the inability to provide geometrically faithful localization, cross-view consistency, and robustness to damages visible only in a single view. CrashSplat features an automatic pipeline composed of four principal stages: per-frame 2D instance segmentation, Structure-from-Motion (SfM) to estimate camera geometry, 3D Gaussian Splatting reconstruction, and a learning-free, single-view segmentation algorithm for identifying damage-affected 3D Gaussians even when only weak single-view supervision is available (Chileban et al., 28 Sep 2025).

1. Pipeline Structure and Motivation

Traditional image-based damage detection with instance segmentation networks (e.g., Mask-R-CNN, YOLO) yields fast 2D masks per image but cannot resolve geometric ambiguities or quantify depth, area, or size of damages. Furthermore, 2D approaches are limited in handling scenarios where damage—such as a fine scratch or small dent—is visible solely in a specific camera pose. CrashSplat addresses these limitations by uplifting 2D masks to 3D within a Gaussian Splatting framework, enabling:

  • Geometrically accurate localization and quantification of damage.
  • View-consistent visualization and analysis, including novel view synthesis.
  • Reliable segmentation when the target damage is only apparent in a single view.

The core stages of the pipeline are:

  1. 2D damage instance segmentation.
  2. Sparse 3D reconstruction and camera parameter extraction via COLMAP (SfM).
  3. Per-scene 3D Gaussian Splatting (3D-GS) modeling.
  4. Single-view 3D segmentation by projecting Gaussians, conducting Z-buffer filtering and statistical cleaning to isolate damage Gaussians associated with a selected 2D mask.

2. 2D Damage Mask Generation

CrashSplat utilizes a YOLOv11 segmentation network (Ultralytics, “l” lightweight variant) to produce 2D damage masks on each RGB frame. The model is trained using CarDD (4,000 images, 6 classes) and VehiDE (13,945 images, 8 classes) datasets. Inference throughput is sub-100 ms per 640×640 image on Apple M3 Pro hardware.

Reported instance segmentation performance:

  • CarDD: mAP@50 = 75.8, mAP@50:95 = 58.5
  • VehiDE: mAP@50 = 51.2, mAP@50:95 = 29.4

These 2D segmentation masks serve as the initial supervisory signal for damage localization and initiate the 2D-to-3D uplift protocol.

3. Structure-from-Motion Preprocessing

For geometric calibration, CrashSplat applies COLMAP (Schonberger & Frahm 2016). The process yields camera intrinsics,

K=[fx0cx 0fycy 001]K = \begin{bmatrix} f_x & 0 & c_x \ 0 & f_y & c_y \ 0 & 0 & 1 \end{bmatrix}

and extrinsics, consisting of rotation matrix RR3×3R \in \mathbb{R}^{3 \times 3} and translation tR3t \in \mathbb{R}^3. The standard pinhole transformation is employed: ximageK[Rt]Xworld\mathbf{x}_{\mathrm{image}} \sim K [\,R \mid t\,] \mathbf{X}_{\mathrm{world}} with explicit mapping between world and camera frame for projecting 3D Gaussians.

4. 3D Gaussian Splatting Representation

CrashSplat leverages 3D Gaussian Splatting to encode the reconstructed scene as a set of NN Gaussians {Gi}i=1N\{G_i\}_{i=1}^N characterized by:

  • Mean position μiR3\mu_i \in \mathbb{R}^3
  • 3×3 covariance Σi\Sigma_i (ellipsoid structure)
  • RGB color CiR3C_i \in \mathbb{R}^3
  • Opacity αi\alpha_i

The spatial density is

Gi(x)=wiexp(12(xμi)Σi1(xμi))G_i(\mathbf{x}) = w_i\,\exp\Bigl(-\frac{1}{2} (\mathbf{x} - \mu_i)^\top \Sigma_i^{-1} (\mathbf{x} - \mu_i)\Bigr)

Rasterization, as per Kerbl et al. (2023), is performed by:

  • Projecting Gaussians to image tiles (16×16 pixels)
  • Sorting Gaussians by ZcamZ_{\mathrm{cam}} (front-to-back)
  • Opacity accumulation: αaccαacc+(1αacc)αi,halting if αaccT (T1)\alpha_{\mathrm{acc}} \leftarrow \alpha_{\mathrm{acc}} + (1-\alpha_{\mathrm{acc}})\,\alpha_i, \quad \text{halting if}~\alpha_{\mathrm{acc}} \geq T~(T \approx 1)
  • Color compositing (backwards pass): CoutαiCi+(1αi)CprevC_{\mathrm{out}} \leftarrow \alpha_i\,C_i + (1-\alpha_i)\,C_{\mathrm{prev}}

This representation supports highly efficient view-synthesis and facilitates precise mask lifting from 2D to 3D.

5. Single-View 3D-GS Segmentation Algorithm

CrashSplat’s learning-free segmentation algorithm proceeds as follows:

a) Select one input frame and retrieve its 2D damage mask. b) Project the center of each 3D Gaussian onto the frame using the pinhole model; discard Gaussians outside the 2D mask polygon. c) Z-buffer filtering:

  • Sort Gaussians by ascending depth ziz_i.
  • For each accepted Gaussian, the per-pixel contribution is weighted by:

w(p)=αiexp(12D2(p))w(\mathbf{p}) = \alpha_i \exp\left(-\frac{1}{2} D^2(\mathbf{p})\right)

where Mahalanobis distance D2(p)=(pμ)Σimage1(pμ)D^2(\mathbf{p}) = (\mathbf{p} - \bm{\mu})^\top \Sigma_{\mathrm{image}}^{-1} (\mathbf{p} - \bm{\mu}), and Σimage=JΣcamJ\Sigma_{\mathrm{image}} = J\,\Sigma_{\mathrm{cam}}\,J^\top with

J=[fx/z0fxx/z2 0fy/zfyy/z2]J= \begin{bmatrix} f_x/z & 0 & -f_x\,x/z^2 \ 0 & f_y/z & -f_y\,y/z^2 \end{bmatrix}

  • A dynamic threshold β\beta is set as the mean opacity of accepted Gaussians.

d) Statistical cleaning: Discard any Gaussian with zzˉ>2σz|z - \bar{z}| > 2\sigma_z or ααˉ>2σα|\alpha - \bar{\alpha}| > 2\sigma_\alpha. e) Hole-filling: Re-add Gaussians with zz within 2σ2\sigma of the mean mask depth, even if not selected by Z-buffering.

This scheme is robust to situations where damage is only evident in a single pose and multi-view consistency cannot be enforced.

6. Experimental Evaluation

CrashSplat was evaluated on both proprietary self-recorded vehicle damage sequences (scratches, flat tire, broken lamp) and public SPIn-NeRF scenes. The primary evaluation metrics were Intersection-over-Union (IoU), F1 score, pixel-accuracy, and runtime per instance.

Self-recorded quantitative results:

Object IoU (input/mean 3 views) F1 (input/mean 3 views) Accuracy (input/mean 3 views) Time (s)
Scratch 65.7 / 52.4% 79.3 / 67.8% 99.30 / 99.28% 0.04
Flat tire 88.2 / 87.2% 93.7 / 93.1% 99.01 / 99.01% 0.12
Broken lamp 82.4 / 67.1% 90.4 / 78.8% 97.93 / 97.70% 0.31

Public SPIn-NeRF scene results (mean over 6 scenes):

Method IoU (%) Accuracy (%)
Single-View (Cen et al 2025) 81.1 97.1
MVSeg (Mirazaei et al 2023) 90.9 99.1
SA3D (Cen et al IJCV 2025) 93.1 99.0
SAGD (Hu et al 2024) 91.1 98.8
CrashSplat (1 mask) 79.9 96.9

Qualitative assessments (e.g., projected back scratches, wheel dents, lamp cracks) showed precise alignment with ground-truth masks. Visual comparison on “Truck” scenes indicates that performance is competitive with multi-view methods using only a single 2D mask.

7. Strengths, Limitations, and Future Directions

CrashSplat’s design enables true single-view 3D damage segmentation without requiring multiple annotated masks or scene-specific learning. The pipeline features a sub-second, CPU-only per-instance runtime and robust handling of fine or view-specific damages. Its learning-free 3D step eliminates the need for foundation models such as SAM.

Principal limitations include:

  • Dependence on the accuracy of the upstream 2D detector; false positives/negatives in YOLO masks (e.g., from shadows or reflections) propagate to 3D. Augmentation with additional synthetic data may improve robustness.
  • Sensitivity to the quality of Gaussian Splatting reconstructions. Low-opacity Gaussians on glass or reflective surfaces may eliminate true damage evidence.
  • The evaluation protocol relies on back-projected 2D masks due to the absence of real 3D damage ground truth. Construction of a dedicated 3D damage dataset remains an open problem.
  • The segmentation algorithm is currently single-threaded. Future work could focus on tiling strategies, multi-threading, or GPU acceleration to reduce latency further.

CrashSplat establishes a principled method for automating 3D vehicle damage segmentation from casual video captures, combining off-the-shelf 2D networks, established SfM methods, advanced 3D-GS modeling, and a lightweight per-view filtering approach for single-view supervision (Chileban et al., 28 Sep 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to CrashSplat.