Nonlinear System Identification Nano-drone Benchmark

Published 16 Dec 2025 in eess.SY and cs.RO | (2512.14450v1)

Abstract: We introduce a benchmark for system identification based on 75k real-world samples from the Crazyflie 2.1 Brushless nano-quadrotor, a sub-50g aerial vehicle widely adopted in robotics research. The platform presents a challenging testbed due to its multi-input, multi-output nature, open-loop instability, and nonlinear dynamics under agile maneuvers. The dataset comprises four aggressive trajectories with synchronized 4-dimensional motor inputs and 13-dimensional output measurements. To enable fair comparison of identification methods, the benchmark includes a suite of multi-horizon prediction metrics for evaluating both one-step and multi-step error propagation. In addition to the data, we provide a detailed description of the platform and experimental setup, as well as baseline models highlighting the challenge of accurate prediction under real-world noise and actuation nonlinearities. All data, scripts, and reference implementations are released as open-source at https://github.com/idsia-robotics/nanodrone-sysid-benchmark to facilitate transparent comparison of algorithms and support research on agile, miniaturized aerial robotics.

Abstract PDF Upgrade to Chat

Summary

The paper presents the first open benchmark for nonlinear system identification on nano-scale quadrotors, addressing a significant gap in embedded aerial robotics research.
It details an experimental setup with diverse trajectory designs and a comparison of modeling approaches, including physics-based, neural, and hybrid methods.
Results highlight the superior robustness of hybrid models and expose limitations in capturing aggressive nano-drone rotational dynamics.

Nonlinear System Identification Nano-drone Benchmark: An Expert Perspective

Introduction and Problem Formulation

The "Nonlinear System Identification Nano-drone Benchmark" (2512.14450) introduces the first open benchmark tailored for nonlinear system identification (SysID) on commercial nano-scale quadrotors. The platform basis is the Crazyflie 2.1 Brushless nano-quadrotor, a sub-50-g mass, <10 cm-diameter multi-rotor with integrated IMU, optical flow, laser altimeter, and ground-truth motion capture, providing a challenging MIMO/nonlinear, underactuated dynamical system in the low-thrust, high-disturbance mini-UAV regime.

This benchmark is motivated by the lack of SysID-focused public datasets and protocols for nano-aerial robotics, contrasting the availability and impact of benchmarks in larger multirotor platforms and perception-centric tasks (e.g., SLAM, VIO). The researchers address this infrastructural gap by releasing complete datasets, open-source code, and standardized evaluation metrics for fair model comparison and reproducibility, facilitating cross-laboratory progress in robust SysID and model-based control for agile, highly-constrained platforms.

Platform and Experimental Setup

The experimental infrastructure deploys Crazyflie 2.1 Brushless equipped with Flow-deck v2 (optical flow/ToF sensors) and AI-deck (GAP8 SoC, WiFi), operating in a dedicated, fully-surrounded OptiTrack motion capture arena.

Figure 2: A Crazyflie 2.1 brushless flying in the motion-capture-equipped laboratory.

Key hardware characteristics include 45 g total mass, symmetric inertia tensor, four 8 mm 10000 KV brushless motors with open-source ESCs (providing bidirectional DSHOT/RPM telemetry), and tight resource limitations characteristic of nano-class drones (STM32F4 MCU-centric processing).

The control stack is a cascaded geometric controller (SO(3)/ $\mathbb{H}$ representation), executed at 500 Hz, sourcing setpoints from ground station via NRF51822 (Crazyswarm2/ROS). State estimation fuses IMU, optical flow, ToF, and motion capture using EKF at 100 Hz; careful clock synchronization, timestamp alignment, and filtering (windowed Butterworth, log-map for orientations) ensure coherent, low-noise datasets suitable for SysID.

Trajectory Design and Dataset Characteristics

Four reference trajectories are engineered to induce sufficiently rich excitation for identification and validation: Square (planar), Random (probabilistic), Melon (coupled x-z elliptical with rotating plane and strong attitude excitation, test-only), and Chirp (frequency-rich multisine). These trajectories span both structured and unstructured, frequency- and axis-diverse regimes, with each trial including ≥3 repetitions.

Figure 4: Reference trajectories (top) and long-exposure images of actual flights (bottom), establishing repeatability and diversity for identification.

Output variables comprise time-aligned, 100 Hz-sampled 13D state (position, velocity, attitude quaternion, angular velocity), matched to 4D motor RPM commands. The segmentation, filtering, and motor-acceleration delay compensation procedures ensure minimized temporal bias and maximize effective signal-to-noise ratio for nonlinear identification tasks.

Data splits segregate Square, Random, and Chirp for training (≈56k samples, 74%) and Melon for test/validation (≈19k samples, 26%). The test set is intentionally excluded from model development to ensure authentic, blind generalization assessment.

Modeling Baselines: Physics, Residuals, and Hybridization

The paper establishes performance baselines spanning physics-driven, purely data-driven, and hybrid (physics + residual) SysID pipelines.

Physical Model: Implements full rigid-body quadrotor dynamics, with propeller force/torque modeled quadratically as functions of angular velocity. Parameters ( $k_F$ , $k_M$ ) for thrust and moment coefficients are explicitly identified via least-squares regression using measured body acceleration ( $a^{\mathrm{IMU}}$ ) and angular acceleration ( $\dot{\bm{\omega}}$ ).

Figure 1: Comparison of measured and model-predicted thrust and torques, illustrating high fidelity for thrust but limited accuracy for torques using quadratic model.

Model structure:

State propagation via RK4 integration;
Inputs: motor RPMs;
Outputs: 13D state vector (with quaternion orientation);
Torque and force mapping follows standard rigid-body model with cross-coupling matrix.
Figure 3: Lateral body-frame force components that expose modeling mismatch with the quadratic thrust/torque map, emphasizing real-world non-idealities.

The analysis reveals that while thrust prediction along the z-axis is accurate, lateral body-frame forces and pitch/roll/yaw torques are significantly misestimated, particularly for fast, aggressive trajectories. Persistent residuals indicate unmodeled dynamics, e.g., aerodynamic interactions, saturations, or motor torque lags, that violate the quadratic model assumption.

Figure 5: Detailed comparison between measured and predicted yaw torque, further highlighting systematic deviations beyond noise.

Black-box Models: Feedforward MLPs and LSTMs are trained for direct state prediction using windowed input-output chunks ( $H=50$ -step [0.5s] windows). These models predict state increments (residuals), recursively accumulating to simulate future trajectories. The residual approach mitigates drift for short horizons but exhibits bias accumulation and open-loop divergence over longer runs.

Hybrid Physical+Residual: A composite predictor integrates the physics-based prediction with neural residual correction, yielding enhanced robustness and stability—especially for drift-prone variables (e.g., velocities and attitude), leveraging domain knowledge priors for improved bias-variance trade-off.

Evaluation Protocol and Quantitative Results

The evaluation protocol centers on mean absolute error (MAE) per prediction horizon for position, velocity, orientation (SO(3) geodesic distance), and angular velocity, across rolling windows up to $H=50$ steps (0.5s horizon).

Figure 6: Multi-horizon MAE curves across models, benchmarking the degradation profile for each variable/pipeline.

Major findings (Table summary):

Short-horizon (h=1): All learned models significantly outperform the naive integrator. Physics and Phys+Res yield the lowest MAE, especially for position ( $<$ 2 mm) and linear velocity.
Long-horizon (h=50): Error accumulation is mitigated best by Phys+Res and physical-only models; black-box models (especially LSTM) drift substantially, particularly in position/velocity due to weak inductive biases.
Rotational Kinematics: For attitude and angular velocity, all models struggle due to unmodeled actuation dynamics, with minimal improvement from neural corrections at current time scales/horizons.
Computational Efficiency: Inference time per prediction on an STM32 F4-class MCU is measured for all models. Pure MLPs have lowest per-step compute cost, LSTMs are ~2 $\times$ heavier, and physics-based models are penalized by matrix operations.
Figure 7: Segmental prediction traces (t=20–25 s, Melon test), contrasting 50-step open-loop rollouts against ground truth, highlighting error build-up in neural models.

Implications and Future Directions

This benchmark renders flagship contributions for the nano-UAV research community:

First public real-world dataset, tools, and evaluation protocols for nano-drone SysID: This enables repeatable, rigorous, and scalable research, lowering the entry barrier for academia and industry, particularly for embedded model-based predictive control, reinforcement learning, and robust nonlinear identification.
Demonstrates limitations of standard quadratic modeling: The inability of physics or hybrid models to capture certain rotational torques (see Figure 5, Figure 6) on aggressive nano-drones reveals undiscovered nonlinearities or slow-scale dynamics, suggesting aeroelastic, unsteady flow, motor lag, or interaction terms not reflected in canonical expressions.
Hybrid approaches informed by physics are most robust: Pure neural models are prone to bias accumulation and instability in open-loop rollout settings, a critical consideration for deployment on real platforms with fast update rates and resource restrictions.

Practically, the benchmark is indispensable for research on embedded adaptive control, sim-to-real transfer (domain randomization, residual learning), and rapid evaluation of new dynamical models with guaranteed reproducibility and cross-comparability. Theoretically, the data exposes the gap in multi-timescale, high-fidelity actuation modeling essential for next-generation miniaturized aerial vehicles, raising open questions about identification of coupled, unsteady, and resource-constrained robotic systems.

Conclusion

The Nano-drone SysID benchmark sets a modern standard for experimental system identification in the nano-UAV domain by providing comprehensive datasets, detailed protocols, and a suite of strong, reproducible baselines. The dataset enables advances in data-driven, hybrid, and physics-enhanced SysID and control—while exposing the critical open problem of robustly modeling nano-scale rotational dynamics under aggressive maneuvers. Progress on these fronts will likely require both more expressive learning architectures (with longer context and explicit temporal modeling) and fundamentally improved physical models. Overall, the work catalyzes principled development and evaluation of robust, agile, embedded control algorithms for miniaturized aerial robotics.

Markdown Report Issue