D_TreeEVO: Hybrid Decision Tree Evolution

Updated 10 January 2026

D_TreeEVO is a hybrid metaheuristic that combines population-based feature selection via EVO with pruned decision trees for efficient classification in high-dimensional IDS tasks.
It employs dynamic decay rules and energy barrier dynamics to optimize binary feature masks, significantly reducing feature sets while enhancing accuracy.
The unified pipeline ensures streamlined preprocessing, rapid model training, and interpretable results, outperforming baseline methods on benchmark intrusion detection datasets.

D_TreeEVO is a hybrid decision tree metaheuristic that integrates the Energy Valley Optimizer (EVO) for population-based feature selection with a conventional (typically pruned) decision tree classifier, targeting high-performance learning, especially for high-dimensional tabular tasks such as intrusion detection in cloud computing environments. D_TreeEVO combines wrapper-based feature selection, evolutionary search, and supervised classification to address key issues of dimensionality, complexity, runtime, and predictive performance in modern data-driven security applications (Al-Husseini, 24 Jun 2025, Alhusseini et al., 3 Jan 2026).

1. Model Architecture and Workflow

D_TreeEVO consists of two principal components: the Energy Valley Optimizer (EVO) for wrapper-based feature selection and a decision tree (DT) classifier. The overall process follows these steps:

Data Preprocessing: Remove identifiers (e.g., IPs, timestamps), handle missing values, encode categorical variables, balance classes via downsampling, and scale numeric features using Min–Max normalization.
Feature Selection via EVO: A population of candidate binary feature-selection masks is maintained; each mask encodes a subset of the total features.
Wrapper Evaluation: For each mask, a decision tree is trained (typically with cross-validation), and performance metrics (e.g., accuracy, detection rate, false positive/negative rates) are computed.
Population Update: EVO moves candidate masks according to energy-barrier dynamics and neighborhood relationships, balancing exploration and exploitation.
Final Selection: After convergence, the best mask is used to train a final decision tree classifier on the selected features.
Model Evaluation: Predictive performance is measured on a held-out test set using accuracy, F1, detection/recall, and false alarm rates.

This modular workflow enables fast training, interpretable models, and robust handling of large-scale, high-dimensional data.

2. Energy Valley Optimizer (EVO): Algorithmic Formulation

EVO operates as a population-based metaheuristic, where each population member (particle) is a binary selection vector $X_i \in \{0,1\}^D$ indicating the active features. The objective is to optimize a cost function $J(S)$ —for IDS, typically a weighted sum of $1-\text{acc}$ , false positive rate (FPR), and false negative rate (FNR):

$J(S) = w_1\left(1-\text{Accuracy}(S)\right) + w_2\,\text{FPR}(S) + w_3\,\text{FNR}(S)$

Particles update according to four decay-mimetic rules (alpha, gamma, two beta variants), combining exploitation (movement toward the global best mask $X_\mathrm{BS}$ ), exploration (perturbation by random neighbor $X_\mathrm{NG}$ ), and neighborhood/centroid terms. Update steps are stochastically weighted and thresholded back to binary.

Update equations:

Alpha decay: $X_i \leftarrow X_i + T_1(X_\mathrm{BS} - X_i)$
Gamma decay: $X_i \leftarrow X_i + T_2(X_\mathrm{NG} - X_i)$
Beta decay, centroid: $X_i \leftarrow X_i + \frac{T_3(X_\mathrm{BS} - X_\mathrm{CP})}{S\!L_i}$
Beta decay, neighbor: $X_i \leftarrow X_i + T_4(X_\mathrm{BS} - X_\mathrm{NG})$

where $J(S)$ 0 are random $J(S)$ 1 reals, $J(S)$ 2 is the current best solution, $J(S)$ 3 is the centroid, and $J(S)$ 4 is a stability factor.

Empirically, EVO reduces feature sets (e.g., CIC-DDoS2019: 88 → 38 features, CSE-CIC-IDS2018: 80 → 43), improving both computational efficiency and classification rates (Alhusseini et al., 3 Jan 2026).

3. Decision Tree Component: Splitting and Hyperparameters

The DT classifier uses either Gini impurity,

$J(S)$ 5

or entropy/information gain. Splits are greedy, maximizing $J(S)$ 6Gini or IG over features selected by EVO. The recommended hyperparameters, based on evaluation on cloud-IDS streams, are:

max_depth = 10 (to prevent overfitting),
min_samples_split = 0.05 × $J(S)$ 7,
min_samples_leaf = 0.02 × $J(S)$ 8,
no post-pruning (early stopping via above constraints).

This configuration balances tree expressivity with statistical robustness, particularly in high-dimensional settings after feature selection.

4. End-to-End D_TreeEVO Pipeline: Practical Realization

Pipeline steps are as follows:

Data loading: Ingestion from public IDS datasets (CIC-DDoS2019, CSE-CIC-IDS2018, NSL-KDD), typically downsampling to manage class imbalance.
Preprocessing: Removal of redundant information, imputation, encoding, balance correction, and scaling.
Feature selection: EVO run with population size and iteration count set for dataset scale (e.g., 32 run ensembles in benchmarking).
Model training: Stratified 80/20 train/test split; DT fitted on features selected by EVO.
Evaluation protocol: Metrics computed per run—mean and standard deviation across 24–32 repeats (random seeds).

All steps are implemented in a unified workflow, achieving high computational efficiency and interpretability.

5. Empirical Benchmarking and Comparative Performance

D_TreeEVO demonstrates SOTA performance on benchmark IDS datasets when compared with baseline ML and metaheuristic combinations.

Dataset	Model	Accuracy	F1-score	# Features
CIC-DDoS2019	D_TreeEVO	99.13%	98.94%	38
CIC-DDoS2019	SVMEVO	95.60%	94.99%	38
CIC-DDoS2019	RFEVO	95.86%	95.34%	38
CSE-CIC-IDS2018	D_TreeEVO	99.78%	99.70%	43
CSE-CIC-IDS2018	SVMEVO	98.50%	98.51%	43

Confusion matrices show >99% correct classification for most classes, with rare misclassification (<0.3%). D_TreeEVO outperforms both deep learning and other hybrid methods on these benchmarks (Al-Husseini, 24 Jun 2025, Alhusseini et al., 3 Jan 2026).

6. Analysis of Trade-offs, Limitations, and Observed Behavior

EVO vs GWO: EVO achieves faster and more reliable convergence than Grey Wolf Optimizer (GWO) in IDS feature selection, due to its dynamic energy/barrier landscape and combined centroid/global/neighborhood updates (Al-Husseini, 24 Jun 2025).
Detection Rate vs False Alarm: D_TreeEVO yields slightly lower detection rates but significantly reduced false alarm rates compared to GWO-based approaches, a trade-off suitable for operational IDS contexts.
Computational overhead: EVO incurs additional metaheuristic search cost but leads to substantially reduced dimensionality, improving training/inference time post-selection.
Robustness: No formal significance tests are reported, but the consistency of gains across runs and low error rates are suggestive of true improvement.
Scalability: D_TreeEVO's ability to operate on large-scale, imbalanced datasets with high-dimensional feature spaces demonstrates practical viability.

7. Limitations, Open Questions, and Prospects

Hyperparameter tuning: EVO-specific meta-parameters (population size, decay coefficients, iteration count) must be tuned per dataset.
Generalization: The approach is sensitive to very low-frequency class instances; extending to ensembles or deep learners is posed as future work (Al-Husseini, 24 Jun 2025, Alhusseini et al., 3 Jan 2026).
Online adaptation: The current workflow does not support online/streaming feature re-selection; incremental model adaptation under concept drift is an open avenue.
Significance: While empirical differences are substantial, formal statistical validation (e.g., McNemar’s, paired $J(S)$ 9-test) is not reported.

A plausible implication is that D_TreeEVO, by integrating a highly adaptive evolutionary feature selector with statistically robust tree learners, provides a flexible, interpretable, and performant solution for high-dimensional classification domains, especially in security-sensitive cloud environments. The demonstrated rapid convergence and consistent gains across multiple datasets position D_TreeEVO as a leading framework for hybrid metaheuristic-ML pipelines (Al-Husseini, 24 Jun 2025, Alhusseini et al., 3 Jan 2026).

Markdown Report Issue Upgrade to Chat

References (2)

A Hybrid Intrusion Detection System with a New Approach to Protect the Cybersecurity of Cloud Computing (2025)

AI-Powered Hybrid Intrusion Detection Framework for Cloud Security Using Novel Metaheuristic Optimization (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to D_TreeEVO Model.