Sufficiency of NeuralFoil training dataset size

Determine whether the size of the training dataset consisting of 7,913,292 XFoil-generated aerodynamic cases used to train NeuralFoil is too small, too large, or appropriate for achieving the reported model performance.

Background

NeuralFoil was trained on synthetically generated data produced by XFoil across randomized airfoil shapes and flow conditions, resulting in 7,913,292 cases, of which 56% converged and were used for training physics outputs. The dataset spans a wide range of angles of attack, Reynolds numbers, and transition settings to promote generalization.

The authors note that this dataset is roughly two orders of magnitude larger than those used in similar studies, but they do not know whether such a volume is necessary. Clarifying the relationship between dataset size and model performance would inform data generation strategies and the sample complexity needed for comparable accuracy.

References

It is not currently known whether the number of points in this dataset is too little, too much, or appropriate.

NeuralFoil: An Airfoil Aerodynamics Analysis Tool Using Physics-Informed Machine Learning  (2503.16323 - Sharpe et al., 20 Mar 2025) in Section 2.6 (Training Data Generation)