PELD: Pedestrian Patterns Dataset

Updated 2 January 2026

The PELD dataset is a large-scale vision-based collection of full-HD video and GPS data capturing pedestrian-vehicle interactions across urban, mixed, and suburban routes.
It employs systematic multi-timeslot data acquisition over a week to support spatio-temporal analysis and predictive risk modeling for autonomous navigation.
The processing pipeline uses a Fast R-CNN detector on video frames to derive pedestrian density metrics, despite measurement noise and limited behavioral annotations.

The PELD dataset refers to several distinct resources across diverse domains. The most prominent and original usage designates the Pedestrian Patterns Dataset, a large-scale, repeatedly sampled, multimodal corpus for research in automated risk-aware navigation and long-term localization. PELD has also appeared in unrelated contexts—endoscopic spinal instance segmentation, personality-conditioned dialog systems, and pulmonary embolism detection—each time as an acronym for a separate dataset. This entry details the Pedestrian Patterns Dataset (PELD) as introduced by Mokhtari and Wagner, highlighting technical specifics and research applications, while distinguishing it from homonymous corpora in other subfields (Mokhtari et al., 2020).

1. Dataset Definition and Purpose

The Pedestrian Patterns Dataset (PELD) is a large-scale vision-based collection of full-HD video and GPS data covering 126 traversals of three distinct urban, mixed, and suburban routes in State College, Pennsylvania. Its temporal regime (six timeslots per day, seven consecutive days) facilitates analysis of spatio-temporal pedestrian density and dynamic risk modeling for autonomous vehicles. PELD's primary motivation is to support predictive modeling of route-specific risk $R(x)$ and to enable long-term vision-based localization by capturing repeatable, dynamic visual cues under real-world operational variability.

2. Data Acquisition Protocol

PELD acquisition employed a Samsung Galaxy S9+ dual-camera system, recording at 1920×1080 px, 60 fps, mounted within a vehicle, and GPSLogger for 1 Hz geolocation. Three routes—each 2.8–3.5 miles—were selected for density contrast: (A) campus core, (B) urban/residential, and (C) suburban/rural. Over the course of one week, each route was traversed at 8:45, 10:45, 12:45, 14:45, 16:45, and 17:45 daily, yielding systematic coverage of diurnal and weekday/weekend effects. Environmental states (sunny, cloudy, rainy) were manually labeled for each traversal; lenses were cleaned prior to each run to ensure data quality. Video files were split into ≈10-minute segments per timeslot to manage storage (total ≈600 GB) (Mokhtari et al., 2020).

3. Processing Pipeline and Annotation

PED detection adopted Fast R-CNN (pretrained on PASCAL VOC, no in-domain fine-tuning) applied to every video frame (60 fps). Inference produced frame-wise pedestrian bounding boxes with confidence scores. On a GTX 1070 GPU, a 7–10 min segment required ≈20 min for full-frame inference. These detections were then aggregated:

Pedestrian count per frame: $N_{\text{ped}}(t)$
Segmental pedestrian density: For route segment $s$ of length $L(s)$ and time $t$ , the density is

$\rho(t, s) = \frac{N_{\text{ped}}(t,s)}{L(s)}$

where $N_{\text{ped}}(t,s)$ is the count assigned to segment $s$ at time $t$ .

Total traversal load: $N_{\text{total}} = \sum_{t=0}^{T} N_{\text{ped}}(t)$

The dataset does not supply manual bounding box refinement; all annotations rely on the off-the-shelf Fast R-CNN model, introducing measurement noise proportionate to the detector's recall/precision on free-road video. No age, posture, or trajectory is annotated per instance.

4. Data Format, Splitting, and Licensing

PELD is structured hierarchically by route, day, and timeslot:

Directory layout:
- Route_A/, Route_B/, Route_C/
- Monday/08_45/
- 08_45.mp4 (raw video)
- 08_45-python.mp4 (annotated video)
- 08_45.gpx (GPS)
- 08_45.xlsx (frame-level detections)
- ...repeated for all times, days, routes

Each timeslot aligns video, GPS, and detection count data. Processed videos show bounding boxes overlaid at 20 fps. File naming is systematic. The dataset is publicly downloadable under CC BY-NC 4.0; researchers must cite the original work for publication. No credential is required for access (Mokhtari et al., 2020).

5. Quantitative Characteristics and Baseline Results

PELD comprises:

Route	Total Frames	Frames with ≥1 Pedestrian
A	1,150,000	740,000
B	1,150,000	760,000
C	1,050,000	660,000

This coverage yields hundreds of millions of bounding box detections. Fast R-CNN performance (on PELD) mirrors PASCAL VOC (≈70–75% AP). Processing throughput is ≈60 fps for raw extraction and ≈3 fps for annotated output. The dataset does not supply formal benchmarks; reported use cases include density estimation, route selection, risk assessment, and long-term localization. Example downstream tasks are risk-aware path planning via integrating $\rho(t,s)$ with specified loss functions $L(x, y)$ and vision-based place recognition under pedestrian-induced scene dynamics.

6. Applications, Limitations, and Planned Extensions

PELD supports research into:

Pedestrian density prediction and forecasting.
Autonomous-vehicle route risk assessment integrating expected loss models.
Spatio-temporal vision-based localization in dynamic scenes.
Socially aware navigation, transferring insight to metropolitan environments (this suggests that transfer may require substantial domain adaptation due to geographic homogeneity).

Current limitations include geographic scope (single college town), reliance on pretrained detectors (propagating FP/FN error), and the absence of fine-grained behavioral labeling (e.g., group formation, body posture). Planned extensions include manual bounding box refinement, behavioral tags, augmentation with winter traversals, LiDAR depth for 3D pedestrian tracking, and baseline code for risk estimation and Monte Carlo Localization using pedestrian-density priors (Mokhtari et al., 2020).

7. Terminological Note: PELD Homonymy Across Research Domains

Researchers should note that 'PELD' acronyms designate distinct datasets in other subfields:

Endoscopic Spine Surgery: PELD is used for an instance segmentation dataset (61 patients, 610 endoscopic frames) with masks for adipose, bone, ligamentum flavum, and nerve, distinct from the pedestrian dataset (Lai et al., 26 Dec 2025).
Dialogue Systems: PELD denotes the Personality EmotionLines Dataset, a corpus of sitcom-derived dialogue triples annotated for emotion and speaker personality (Wen et al., 2024).
Pulmonary Embolism Imaging: The PELD dataset (Pulmonary Embolism in CTA images) comprises 5,160 DICOM slices (20 patients) with expert-segmented PE masks, for CAD benchmarking in medical imaging (Masoudi et al., 2017).
Pediatric Neuroimaging: PediDemi, not PELD, is the dataset for demyelinating brain lesion segmentation (Popa et al., 18 Aug 2025).

The Pedestrian Patterns Dataset remains the principal 'PELD' for urban pedestrian-vehicle dynamics and risk-aware localization (Mokhtari et al., 2020).