BirdsEye-RU: Overhead Face Detection Dataset
- BirdsEye-RU is a publicly available dataset for overhead face detection, emphasizing tiny, occluded faces amid extreme scale variance and cluttered environments.
- It integrates 2,978 images from drone and smartphone sources with 8,448 annotated faces, using consistent YOLO-format labeling protocols.
- Evaluation protocols based on COCO-style metrics and baseline tests (e.g., YOLO) demonstrate its utility for surveillance, crowd analytics, and public safety research.
BirdsEye-RU is a publicly available dataset engineered for the detection of faces in overhead imagery, addressing persistent challenges inherent to extreme scale variance and environmental clutter. This resource comprises 2,978 images with 8,448 annotated faces, purposefully selected for inclusion of small and distant facial samples in a diverse array of real-world settings. The dataset amalgamates drone-sourced and high-altitude smartphone-captured images, positioning itself as a benchmark for surveillance, crowd analysis, and public-safety-oriented face detection in overhead contexts (Khan et al., 18 Jan 2026).
1. Dataset Structure and Quantitative Overview
BirdsEye-RU is partitioned into 2,978 images featuring a total of 8,448 faces, yielding a mean of 2.84 faces per image. Images originate from three principal sources: royalty-free drone video frames (Pexels.com), the established DroneFace dataset, and overhead smartphone images. Table 1 summarizes image and face occurrence statistics per source:
$\begin{array}{|l|r|r|c|} \hline \textbf{Source} & \#\text{Images} & \#\text{Faces} & \text{Faces/Image} \ \hline \text{Pexels.com Videos} & 1,601 & 4,414 & 2.76 \ \text{DroneFace Dataset} & 619 & 1,533 & 2.48 \ \text{Smartphone Images} & 758 & 2,501 & 3.30 \ \hline \textbf{Total} & 2,978 & 8,448 & 2.84 \ \hline \end{array}$
Drone-sourced imagery (Pexels and DroneFace combined) accounts for 2,220 images and 5,947 faces, while smartphone imagery comprises 758 images with 2,501 faces. Face-size distribution is heavily skewed towards “small” faces, with a mode at approximately 10 pixel face-height after resizing and over 70% of faces below 40 pixels tall. Medium (20–40 px) and large (>40 px) faces constitute approximately 20% and 10% respectively. Distance metadata from the DroneFace pipeline records camera heights of 1.5, 3, 4, and 5 meters, with subject distances ranging from 2 to 17 meters.
2. Data Acquisition and Preprocessing
BirdsEye-RU sources imagery via three distinct streams:
- Pexels.com Videos: 72 drone-captured, royalty-free videos (resolutions from 1280×720 to 3840×2160; frame rates 24–60 fps). Sampling at 1 fps led to the selection of 1,601 usable frames containing faces.
- DroneFace Dataset: Overhead captures using a GoPro Hero equipped with 170° FOV at 3680×2760 px. Acquisition heights varied between 1.5–5 m, yielding 619 images. These images retain original annotations from dataset authors.
- Smartphone Images: Devices include Vivo Y36, Motorola Edge 50 Fusion, and Edge 50 Neo. Images possess a fixed 4:3 aspect ratio with ≈4000×3000 px resolution. Data was gathered from uncontrolled campus scenes with 11 subjects in various groupings.
Environments span urban streets, parks, rooftops, and riversides under diverse lighting (full sun, overcast, shade), with significant background complexity. All images were resized to 640×640 px to facilitate compatibility with YOLO-style detectors.
3. Annotation Specifications and Protocol
Annotations employ a single class (“face”) with axis-aligned bounding boxes formatted as , normalized to [0,1] consistent with YOLO standards. The annotation policy instructs the inclusion of every visibly identifiable face—even partial or occluded ones—provided a minimum post-resize size of ≈10 pixels is met. Bounding boxes are drawn tightly, encompassing hairlines and chins and intentionally omitting superfluous background.
Quality assurance was executed by manual review of Roboflow-generated annotations and a 10% inter-annotator cross-check. No format errors were detected; the DroneFace sample subset utilizes pre-existing author annotations, preserving methodological integrity.
4. Domain-Specific Challenges and Dataset Characteristics
BirdsEye-RU specifically targets technical impediments in overhead face detection:
- Scale Variation: Prevalence of faces below 20 px height operationalizes the “tiny face” detection paradigm.
- Background Clutter: Irregular scene elements, such as roads, trees, water bodies, and urban fixtures, introduce elevated ambiguity for detectors.
- Non-frontal Poses and Occlusions: Facial occlusions resulting from hats, grouping, and shadows are frequent.
- Sensor Diversity: The use of disparate hardware (GoPro, drones, smartphones) precipitates variation in image resolution, FOV, and lens distortion.
Representative samples (see Figure 1 in (Khan et al., 18 Jan 2026)) exemplify these intrinsic dataset complexities.
5. Evaluation Protocols and Baseline Guidance
The paper advocates for evaluation using COCO-style metrics:
- Primary Metrics: Precision, recall, AP (IoU), and AP (IoU) are recommended.
- Average Precision: Defined as
- Suggested Baseline Models: Training YOLOv5, YOLOX, or RetinaFace on BirdsEye-RU, with reporting of AP, AP, AP, and Recall@100. No official baseline results are released; community submissions should use the standardized train/val/test splits.
A plausible implication is that BirdsEye-RU, with its abundance of tiny and occluded faces in cluttered environments, provides a discriminative test bed for model generalization in practical overhead scenarios.
6. Distribution, Licensing, and Accessibility
BirdsEye-RU is made freely available for non-commercial research under standard Kaggle terms. The dataset is devoid of personally identifiable information and obtained with ethical consent. Access and download are facilitated via Kaggle at https://www.kaggle.com/datasets/mdahanafarifkhan/birdseye-ru.
The directory structure conforms to established conventions:
- train/images, train/labels
- val/images, val/labels
- test/images, test/labels
- dataset.yaml (YOLO format specification)
- db_info.csv (source metadata and split information)
Ready-to-use configuration files for YOLO and CSV metadata support streamlined custom split generation. The dataset fills a critical gap in the public domain for real overhead facial data, especially regarding the representation of very small faces (Khan et al., 18 Jan 2026).
7. Significance and Application Scope
BirdsEye-RU enables robust methodological development and benchmarking in overhead face detection, a sub-area marked by scale variance, sensor heterogeneity, and scene clutter. The dataset’s breadth supports research across surveillance, crowd analytics, and public-safety domains. Given the absence of pre-existing resources with comparable focus on small and distant faces under challenging environmental conditions, its release catalyzes both algorithmic advancements and empirical validation for overhead facial detection systems.