Papers
Topics
Authors
Recent
Search
2000 character limit reached

Optimized Importance & Percentage AutoFS

Updated 18 November 2025
  • The paper demonstrates that integrating importance scores with a percentage-based selection enhances intrusion detection performance while reducing model dimensionality.
  • The methodology combines data-driven feature ranking with explicit percentage constraints to enable efficient multi-objective optimization in AutoML pipelines.
  • Empirical results on CICIDS2017 show improved F₁-scores (up to 99.773%), reduced latency (0.0031 ms/sample), and lower calibration errors.

Optimized Importance and Percentage-based Automated Feature Selection (OIP-AutoFS) is an advanced feature selection methodology explicitly designed for integration into Automated Machine Learning (AutoML) pipelines, particularly in multi-objective optimization contexts where model accuracy, confidence, and computational efficiency are simultaneously targeted. OIP-AutoFS addresses the requirement for scalable, interpretable, and deployment-efficient feature selection mechanisms—critical for applications in resource-constrained environments such as IoT and edge computing—by combining importance-driven and percentage-based feature subset selection in an automated manner (Yang et al., 11 Nov 2025).

1. Conceptual Overview and Motivation

OIP-AutoFS is motivated by the computational and statistical challenges encountered in security-focused AutoML pipelines. Traditional feature selection approaches, such as wrapper and filter methods, often fail to scale to high-dimensional datasets and can induce unnecessary computational overhead or fail to optimize for downstream model efficiency. OIP-AutoFS is introduced within an Intrusion Detection System (IDS) pipeline that integrates all four canonical AutoML stages: data preprocessing, feature selection, model/hyperparameter selection, and validation, emphasizing not only detection effectiveness but also confidence and efficiency (Yang et al., 11 Nov 2025).

The core innovation is the explicit optimization of feature sets according to both their learned importance (e.g., Gini, gain, permutation importance from tree models) and a user- or system-defined percentage constraint, resulting in streamlined feature sets that retain discriminative power for detection while minimizing computational and memory footprints.

2. Formal Definition and Selection Procedure

Given a preprocessed dataset D={X,Y}D = \{X, Y\} where XRn×dX \in \mathbb{R}^{n \times d} is the feature matrix, OIP-AutoFS is applied as follows:

  • Let IjI_j denote the computed importance score for feature fjf_j (using a base estimator, typically XGBoost or LightGBM due to their robust feature importance metrics).
  • Given a percentage pp (e.g., 20%20\%, 40%40\%), select the top-kk features where k=pdk = \lceil p \cdot d \rceil and dd is the total feature dimension post-processing.
  • The subset S{1,...,d}S \subset \{1, ..., d\} of cardinality S=k|S| = k is given by ordering features in descending IjI_j and selecting indices j1,,jkj_1, \ldots, j_k.

Mathematically:

S=argsortj(Ij)[:k],k=pdS = \operatorname{argsort}_{j}(I_j)[:k],\quad k = \lceil p \cdot d \rceil

The operating principle is that the feature ranking is data-driven (importance-based), while the selection cardinality is governed by a tunable, deployment-aware percentage constraint (Yang et al., 11 Nov 2025).

3. Integration within Multi-Objective AutoML Pipelines

OIP-AutoFS functions as the feature selection (AutoFS) module within the MOO-AutoML IDS pipeline (Yang et al., 11 Nov 2025). Preceding OIP-AutoFS may be an automated data preprocessing (AutoDP) block (e.g., automated missing value imputation, encoding). Subsequent to OIP-AutoFS, the reduced feature matrix is provided to downstream optimizer modules, particularly the Optimized Performance, Confidence, and Efficiency-based Combined Algorithm Selection and Hyperparameter Optimization (OPCE-CASH) block, which executes a multi-objective evolutionary search (typically via Multi-Objective PSO) over both algorithm/hyperparameter choices and the selected feature subset.

Feature percent pp is exposed as a parameter to the optimizer, making feature set size an explicit part of the search space. This enables joint optimization of feature cardinality, model type, and hyperparameters for overall pipeline efficacy.

4. Impact on Detection Performance, Efficiency, and Calibration

Empirical evaluations demonstrate that OIP-AutoFS leads to significant dimensionality reduction—often by 50–80%—without compromising, and in many cases improving, detection performance or calibration quality (Yang et al., 11 Nov 2025). The main empirical findings include:

  • On the CICIDS2017 dataset, OIP-AutoFS combined with OPCE-CASH yields F₁-scores of up to 99.773% with a feature set reduced to 18% of the original size (as chosen by the Pareto-optimal tradeoff).
  • Inference latency and model size are sharply reduced; for example, LightGBM with the OIP-AutoFS selected feature set yields inference times as low as 0.0031 ms per sample and model sizes as small as 0.42 MB.
  • Expected Calibration Error (ECE) is improved due to increased model simplicity and more homogeneous input distributions, e.g., ECE reduced to 0.01–0.03%, far below prior AutoML baselines (Yang et al., 11 Nov 2025).

The significance is that OIP-AutoFS enables edge- and IoT-deployable IDS models with both high accuracy and stringent efficiency constraints, addressing challenges in network security for resource-constrained environments.

5. Comparison to Prior Automated Feature Selection Techniques

Prior approaches to automated feature selection in the AutoML context generally fall into the following categories:

Method Class Dimensionality Constraint Importance-Driven Pareto/Jointer with CASH
Filter/wrapper Implicit Some No
Embedded Model-dependent Yes No
AutoML-integrated Weak Often Weak
OIP-AutoFS Explicit (percentage) Yes Yes (full MOO)

OIP-AutoFS distinguishes itself through explicit and tunable dimensionality control, rigorous integration into the broader pipeline, and optimization-driven feature set selection aligned with multiple downstream objectives—none of which are jointly offered by previous methods in the AutoML/security literature (Yang et al., 11 Nov 2025).

6. Algorithmic and Computational Properties

OIP-AutoFS offers favorable algorithmic characteristics for real-world deployment and high-throughput pipelines:

  • Computational complexity is dominated by the initial importance computation (typically O(nd)O(n d) with tree-based estimators) and a negligible O(dlogd)O(d \log d) for sorting.
  • The method is fully parallelizable up to the batch size supported by the hardware/software stack.
  • The tight coupling with percentage parameterization allows rapid adjustment to inferable resource constraints (e.g., shrinking feature set under memory/latency pressure).

Combined with AutoDP and OPCE-CASH, the pipeline can complete its feature selection and downstream selection/optimization in minutes on a single CPU, making it suitable for practical autonomous scenarios (Yang et al., 11 Nov 2025).

7. Practical Deployment and Reconfigurability

OIP-AutoFS supports deployment-aware reconfiguration, whereby the feature percentage and importance metric can be adjusted in an application-driven manner. Two key Pareto-optimal variants are typically produced: a minimal-feature, maximal-efficiency variant (for on-device retraining in edge/IoT nodes) and a balanced variant (for cloud-assisted inference where computational resources are less restricted). This flexibility is essential for the broad applicability of OIP-AutoFS across heterogeneous deployment environments. The explicit exposure of the percentage hyperparameter further enables the optimizer to tune the tradeoff between detection robustness and resource utilization in a data-adaptive manner (Yang et al., 11 Nov 2025).


In summary, OIP-AutoFS formalizes a scalable and automated feature selection strategy, optimized for both importance and dimensionality constraints, and tightly integrated within multi-objective AutoML pipelines where real-world efficiency, accuracy, and reliability are all critical. Its demonstrably superior performance in AutoML-based intrusion detection systems for IoT and edge contexts attests to its utility in contemporary autonomous security frameworks (Yang et al., 11 Nov 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Optimized Importance and Percentage-based Automated Feature Selection (OIP-AutoFS).