Optimized Importance & Percentage AutoFS
- The paper demonstrates that integrating importance scores with a percentage-based selection enhances intrusion detection performance while reducing model dimensionality.
- The methodology combines data-driven feature ranking with explicit percentage constraints to enable efficient multi-objective optimization in AutoML pipelines.
- Empirical results on CICIDS2017 show improved F₁-scores (up to 99.773%), reduced latency (0.0031 ms/sample), and lower calibration errors.
Optimized Importance and Percentage-based Automated Feature Selection (OIP-AutoFS) is an advanced feature selection methodology explicitly designed for integration into Automated Machine Learning (AutoML) pipelines, particularly in multi-objective optimization contexts where model accuracy, confidence, and computational efficiency are simultaneously targeted. OIP-AutoFS addresses the requirement for scalable, interpretable, and deployment-efficient feature selection mechanisms—critical for applications in resource-constrained environments such as IoT and edge computing—by combining importance-driven and percentage-based feature subset selection in an automated manner (Yang et al., 11 Nov 2025).
1. Conceptual Overview and Motivation
OIP-AutoFS is motivated by the computational and statistical challenges encountered in security-focused AutoML pipelines. Traditional feature selection approaches, such as wrapper and filter methods, often fail to scale to high-dimensional datasets and can induce unnecessary computational overhead or fail to optimize for downstream model efficiency. OIP-AutoFS is introduced within an Intrusion Detection System (IDS) pipeline that integrates all four canonical AutoML stages: data preprocessing, feature selection, model/hyperparameter selection, and validation, emphasizing not only detection effectiveness but also confidence and efficiency (Yang et al., 11 Nov 2025).
The core innovation is the explicit optimization of feature sets according to both their learned importance (e.g., Gini, gain, permutation importance from tree models) and a user- or system-defined percentage constraint, resulting in streamlined feature sets that retain discriminative power for detection while minimizing computational and memory footprints.
2. Formal Definition and Selection Procedure
Given a preprocessed dataset where is the feature matrix, OIP-AutoFS is applied as follows:
- Let denote the computed importance score for feature (using a base estimator, typically XGBoost or LightGBM due to their robust feature importance metrics).
- Given a percentage (e.g., , ), select the top- features where and is the total feature dimension post-processing.
- The subset of cardinality is given by ordering features in descending and selecting indices .
Mathematically:
The operating principle is that the feature ranking is data-driven (importance-based), while the selection cardinality is governed by a tunable, deployment-aware percentage constraint (Yang et al., 11 Nov 2025).
3. Integration within Multi-Objective AutoML Pipelines
OIP-AutoFS functions as the feature selection (AutoFS) module within the MOO-AutoML IDS pipeline (Yang et al., 11 Nov 2025). Preceding OIP-AutoFS may be an automated data preprocessing (AutoDP) block (e.g., automated missing value imputation, encoding). Subsequent to OIP-AutoFS, the reduced feature matrix is provided to downstream optimizer modules, particularly the Optimized Performance, Confidence, and Efficiency-based Combined Algorithm Selection and Hyperparameter Optimization (OPCE-CASH) block, which executes a multi-objective evolutionary search (typically via Multi-Objective PSO) over both algorithm/hyperparameter choices and the selected feature subset.
Feature percent is exposed as a parameter to the optimizer, making feature set size an explicit part of the search space. This enables joint optimization of feature cardinality, model type, and hyperparameters for overall pipeline efficacy.
4. Impact on Detection Performance, Efficiency, and Calibration
Empirical evaluations demonstrate that OIP-AutoFS leads to significant dimensionality reduction—often by 50–80%—without compromising, and in many cases improving, detection performance or calibration quality (Yang et al., 11 Nov 2025). The main empirical findings include:
- On the CICIDS2017 dataset, OIP-AutoFS combined with OPCE-CASH yields F₁-scores of up to 99.773% with a feature set reduced to 18% of the original size (as chosen by the Pareto-optimal tradeoff).
- Inference latency and model size are sharply reduced; for example, LightGBM with the OIP-AutoFS selected feature set yields inference times as low as 0.0031 ms per sample and model sizes as small as 0.42 MB.
- Expected Calibration Error (ECE) is improved due to increased model simplicity and more homogeneous input distributions, e.g., ECE reduced to 0.01–0.03%, far below prior AutoML baselines (Yang et al., 11 Nov 2025).
The significance is that OIP-AutoFS enables edge- and IoT-deployable IDS models with both high accuracy and stringent efficiency constraints, addressing challenges in network security for resource-constrained environments.
5. Comparison to Prior Automated Feature Selection Techniques
Prior approaches to automated feature selection in the AutoML context generally fall into the following categories:
| Method Class | Dimensionality Constraint | Importance-Driven | Pareto/Jointer with CASH |
|---|---|---|---|
| Filter/wrapper | Implicit | Some | No |
| Embedded | Model-dependent | Yes | No |
| AutoML-integrated | Weak | Often | Weak |
| OIP-AutoFS | Explicit (percentage) | Yes | Yes (full MOO) |
OIP-AutoFS distinguishes itself through explicit and tunable dimensionality control, rigorous integration into the broader pipeline, and optimization-driven feature set selection aligned with multiple downstream objectives—none of which are jointly offered by previous methods in the AutoML/security literature (Yang et al., 11 Nov 2025).
6. Algorithmic and Computational Properties
OIP-AutoFS offers favorable algorithmic characteristics for real-world deployment and high-throughput pipelines:
- Computational complexity is dominated by the initial importance computation (typically with tree-based estimators) and a negligible for sorting.
- The method is fully parallelizable up to the batch size supported by the hardware/software stack.
- The tight coupling with percentage parameterization allows rapid adjustment to inferable resource constraints (e.g., shrinking feature set under memory/latency pressure).
Combined with AutoDP and OPCE-CASH, the pipeline can complete its feature selection and downstream selection/optimization in minutes on a single CPU, making it suitable for practical autonomous scenarios (Yang et al., 11 Nov 2025).
7. Practical Deployment and Reconfigurability
OIP-AutoFS supports deployment-aware reconfiguration, whereby the feature percentage and importance metric can be adjusted in an application-driven manner. Two key Pareto-optimal variants are typically produced: a minimal-feature, maximal-efficiency variant (for on-device retraining in edge/IoT nodes) and a balanced variant (for cloud-assisted inference where computational resources are less restricted). This flexibility is essential for the broad applicability of OIP-AutoFS across heterogeneous deployment environments. The explicit exposure of the percentage hyperparameter further enables the optimizer to tune the tradeoff between detection robustness and resource utilization in a data-adaptive manner (Yang et al., 11 Nov 2025).
In summary, OIP-AutoFS formalizes a scalable and automated feature selection strategy, optimized for both importance and dimensionality constraints, and tightly integrated within multi-objective AutoML pipelines where real-world efficiency, accuracy, and reliability are all critical. Its demonstrably superior performance in AutoML-based intrusion detection systems for IoT and edge contexts attests to its utility in contemporary autonomous security frameworks (Yang et al., 11 Nov 2025).