Papers
Topics
Authors
Recent
Search
2000 character limit reached

Berkeley DeepDrive Drone Dataset

Updated 17 January 2026
  • B3D is a comprehensive aerial dataset offering high-resolution videos and detailed vehicle annotations for traffic analysis.
  • It supports research in decentralized multi-agent coordination, motion planning, and empirical modeling of traffic flow dynamics.
  • The accompanying development toolkit streamlines tasks such as object detection, trajectory estimation, and configuration management.

The Berkeley DeepDrive Drone Dataset (B3D) is a large-scale aerial video corpus designed for the empirical study of traffic dynamics, decentralized vehicle coordination, and scene understanding in understructured roadway environments. B3D provides top-down, high-resolution visual data, detailed vehicle annotations, and a development toolkit targeting research questions in motion planning, multi-agent coordination, and traffic flow modeling.

1. Dataset Composition and Data Collection

B3D consists of 20 post-processed 4K-resolution mp4 videos, each recorded at 30 FPS by a DJI Mavic 2 Pro quadcopter. The dataset is structured to capture critical real-world traffic phenomena in two primary classes of scenarios:

  • Junction Videos (8): Covers three-way and four-way intersections (e.g., jnc00–jnc02, jnc07), as well as unsignalized roundabouts with four or five legs (jnc03–jnc06).
  • Highway Videos (12): Includes scenarios such as tail-gating collisions (hwy00, hwy01), post-collision congestion (hwy02), stop-and-go wave formation/dissipation (hwy04, hwy05, with hwy03 as a free-flow baseline), roadwork-induced merging bottlenecks (hwy06–hwy08), and ramp-induced merging events (hwy09–hwy11).

Each video was captured with the drone hovering directly over the road surface, providing fully top-down visual coverage for durations ranging from approximately 10 to 21.5 minutes. Scene environments exhibit a broad span of traffic densities, from free-flow to heavy congestion, all recorded under natural daylight conditions at Chinese road sites during December. No nighttime or low-light video is included. Post-processing establishes ground-truth scale normalization, with junction scenes standardized to 20.825 pixels/m and highway scenes to 6.667 pixels/m. Exact flight altitude and airspeed are not specified in the dataset documentation (Wu et al., 2022).

2. Annotation Schema and Image Data

Annotation is built around a single object class—“vehicle”—encompassing all types (cars, trucks, buses) without subclass distinction. The image dataset comprises 16 002 manually annotated JPEG images, further split as 80% training (12 700), 10% validation (1 666), and 10% testing (1 636). Annotation pipeline steps:

  1. Crop video to focus on clusters of vehicles.
  2. Sample frames uniformly every 15 seconds.
  3. Assign axis-aligned bounding boxes to all visible vehicles (rotated boxes are not used).
  4. Box labels are stored without explicit timestamps (temporal context is inferred from video-source frame and index).

The final dataset contains approximately 265 000 bounding boxes: 135 303 from junction scenario images and 129 939 from highway scenario images. The average image contains roughly 16 vehicle instances. High-density scenes are prevalent, but no fine-grained statistics (e.g., histograms of vehicles per frame) are published. Annotation is performed using CVAT (Computer Vision Annotation Tool), with a recommended workflow that emphasizes manual quality control (Wu et al., 2022).

3. Development Toolkit and Trajectory Estimation

The B3D release includes a development kit with the following components:

  • Repository structure:
    • configs/: Configuration JSONs for training and inference.
    • Dockerfile: Reproducible environment setup.
    • download.py: Data download automation.
    • train.py: RetinaNet detector training (built atop Detectron2).
    • test.py: Inference and basic evaluation routines.
    • mask.py: Polygonal masking for region-of-interest selection.
    • vision/: COCO-format images and annotation JSONs.
    • videos/: The 20 aerial mp4 videos.
  • Detection and Trajectory Pipeline:
  1. Mask the region of interest in each frame using mask.py and a polygonal mask defined in CVAT.
  2. Detect vehicles per frame via RetinaNet inference (test.py), optionally using a provided model checkpoint.
  3. Manually inspect detections for gross errors.
  4. Associate frame-level detections into time-continuous tracks using an off-the-shelf multi-object tracker (e.g., SORT; specific implementation details are not prescribed).
  5. Convert pixel coordinates to world units using the published scale factor for each scenario.

The paper does not include explicit coordinate transforms, filtering, trajectory smoothing equations, or per-scenario calibration steps. All world-coordinate trajectory computation leverages the static scale normalization applied during post-processing (Wu et al., 2022).

4. Consensus-Based Modeling and Use Cases

Although B3D is positioned as a foundation for the empirical study of decentralized vehicle coordination and “social driving etiquette” in environments that lack explicit right-of-way rules, the dataset release paper does not provide a concrete consensus algorithm, optimization objective, or motion-planning framework. No pseudocode, objective functions, or hyperparameter values are included. Instead, B3D is designed to support research in understanding emergent coordination and priority ordering in real-world, understructured traffic, with the expectation that researchers will apply their own consensus-based or game-theoretic models to the high-fidelity vehicle trajectory data (Wu et al., 2022).

A plausible implication is that B3D uniquely enables large-scale empirical calibration and evaluation of decentralized multi-agent planning strategies—particularly those relevant to emerging autonomous vehicle algorithms operating in non-Western road environments.

5. Benchmarks, Metrics, and Evaluation

The B3D release does not present any quantitative benchmarks for detection, tracking, or motion planning (e.g., Average Precision, MOTA, or planning success rates). Instead, it provides code examples illustrating the pipeline for detector training/inference (Detectron2/RetinaNet) and for applying polygonal masks with OpenCV/numpy. All performance assessment in the release paper is qualitative, using visual examples to demonstrate detection confidence and the challenges of ambiguous/small object localization.

No official evaluation metric or task is mandated, and the utility of the dataset for downstream modeling is left to subsequent user contributions. This open scope is intended to support a wide variety of research disciplines, from CV-based vehicle detection to macroscopic traffic modeling and multi-agent planning (Wu et al., 2022).

6. Application in Data-Driven PDE and Traffic Modeling

B3D’s real-world vehicle trajectory data has been utilized in downstream research, notably for learning and evaluating neural network solvers for first-order hyperbolic conservation laws governing traffic dynamics. In one example, field data from the “highway_4” segment (400 meters, 4 lanes, 15 min duration) is processed via computer vision tracking to extract timestamped 2D positions; these are binned into 10 m × 1 s spatiotemporal cells to construct a macroscopic density field ρ(x,t)\rho(x,t) (Baba et al., 10 Jan 2026). Derived quantities such as “fundamental diagrams” are computed by sliding window methods over density and vehicle flow.

In this application, density fields extracted from B3D are used both for calibrating and testing classical and neural finite-volume PDE solvers. Models are trained to predict future density fields given local spatial/temporal stencils, with boundary conditions set by observed data. Multiple evaluation metrics are employed (L₁/L₂ norms, Dynamic Time Warping (DTW), win-rates), but B3D itself supplies only the processed vehicle trajectories; all higher-level features and metrics are computed downstream. The dataset facilitates robust, phase-correct testing of predictive models on empirical congestion waves and traffic instabilities, which are difficult to obtain in simulated or synthetic environments.

7. Limitations and Research Directions

B3D’s main limitations are directly reported:

  • Scope: No consensus algorithm, trajectory optimization code, or explicit social driving model is included in the initial release.
  • Annotations: No subclassification within “vehicle,” no rotated or timestamped bounding boxes, and no explicit occlusion or visibility labeling.
  • Benchmarks: Absence of quantitative detection/tracking baselines or leaderboards.
  • Generalization: Lighting is limited to winter daylight conditions; no low-light, adverse weather, or non-Chinese road scenes.
  • Coordinate Conversion: No code or equations for direct mapping between pixel/frame space and the physical world, beyond global scale normalization.

The dataset is explicitly intended as groundwork for research into decentralized planning, robust vehicle tracking, empirical game-theoretic modeling, and data-driven PDE approaches in complex, real-world driving environments (Wu et al., 2022, Baba et al., 10 Jan 2026). A plausible implication is that further work—including publication of reference consensus models, trajectory estimation toolchains, and comprehensive multi-agent benchmarks—will be necessary to fully realize B3D’s potential for multi-disciplinary research.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Berkeley DeepDrive Drone Dataset.