Papers
Topics
Authors
Recent
Search
2000 character limit reached

D_TreeEVO: Hybrid Decision Tree Evolution

Updated 10 January 2026
  • D_TreeEVO is a hybrid metaheuristic that combines population-based feature selection via EVO with pruned decision trees for efficient classification in high-dimensional IDS tasks.
  • It employs dynamic decay rules and energy barrier dynamics to optimize binary feature masks, significantly reducing feature sets while enhancing accuracy.
  • The unified pipeline ensures streamlined preprocessing, rapid model training, and interpretable results, outperforming baseline methods on benchmark intrusion detection datasets.

D_TreeEVO is a hybrid decision tree metaheuristic that integrates the Energy Valley Optimizer (EVO) for population-based feature selection with a conventional (typically pruned) decision tree classifier, targeting high-performance learning, especially for high-dimensional tabular tasks such as intrusion detection in cloud computing environments. D_TreeEVO combines wrapper-based feature selection, evolutionary search, and supervised classification to address key issues of dimensionality, complexity, runtime, and predictive performance in modern data-driven security applications (Al-Husseini, 24 Jun 2025, &&&1&&&).

1. Model Architecture and Workflow

D_TreeEVO consists of two principal components: the Energy Valley Optimizer (EVO) for wrapper-based feature selection and a decision tree (DT) classifier. The overall process follows these steps:

  1. Data Preprocessing: Remove identifiers (e.g., IPs, timestamps), handle missing values, encode categorical variables, balance classes via downsampling, and scale numeric features using Min–Max normalization.
  2. Feature Selection via EVO: A population of candidate binary feature-selection masks is maintained; each mask encodes a subset of the total features.
  3. Wrapper Evaluation: For each mask, a decision tree is trained (typically with cross-validation), and performance metrics (e.g., accuracy, detection rate, false positive/negative rates) are computed.
  4. Population Update: EVO moves candidate masks according to energy-barrier dynamics and neighborhood relationships, balancing exploration and exploitation.
  5. Final Selection: After convergence, the best mask is used to train a final decision tree classifier on the selected features.
  6. Model Evaluation: Predictive performance is measured on a held-out test set using accuracy, F1, detection/recall, and false alarm rates.

This modular workflow enables fast training, interpretable models, and robust handling of large-scale, high-dimensional data.

2. Energy Valley Optimizer (EVO): Algorithmic Formulation

EVO operates as a population-based metaheuristic, where each population member (particle) is a binary selection vector Xi{0,1}DX_i \in \{0,1\}^D indicating the active features. The objective is to optimize a cost function J(S)J(S)—for IDS, typically a weighted sum of 1acc1-\text{acc}, false positive rate (FPR), and false negative rate (FNR):

J(S)=w1(1Accuracy(S))+w2FPR(S)+w3FNR(S)J(S) = w_1\left(1-\text{Accuracy}(S)\right) + w_2\,\text{FPR}(S) + w_3\,\text{FNR}(S)

Particles update according to four decay-mimetic rules (alpha, gamma, two beta variants), combining exploitation (movement toward the global best mask XBSX_\mathrm{BS}), exploration (perturbation by random neighbor XNGX_\mathrm{NG}), and neighborhood/centroid terms. Update steps are stochastically weighted and thresholded back to binary.

Update equations:

  • Alpha decay: XiXi+T1(XBSXi)X_i \leftarrow X_i + T_1(X_\mathrm{BS} - X_i)
  • Gamma decay: XiXi+T2(XNGXi)X_i \leftarrow X_i + T_2(X_\mathrm{NG} - X_i)
  • Beta decay, centroid: XiXi+T3(XBSXCP)S ⁣LiX_i \leftarrow X_i + \frac{T_3(X_\mathrm{BS} - X_\mathrm{CP})}{S\!L_i}
  • Beta decay, neighbor: XiXi+T4(XBSXNG)X_i \leftarrow X_i + T_4(X_\mathrm{BS} - X_\mathrm{NG})

where T1,,T4T_1,\dots,T_4 are random [0,1][0,1] reals, XBSX_\mathrm{BS} is the current best solution, XCPX_\mathrm{CP} is the centroid, and S ⁣LiS\!L_i is a stability factor.

Empirically, EVO reduces feature sets (e.g., CIC-DDoS2019: 88 → 38 features, CSE-CIC-IDS2018: 80 → 43), improving both computational efficiency and classification rates (Alhusseini et al., 3 Jan 2026).

3. Decision Tree Component: Splitting and Hyperparameters

The DT classifier uses either Gini impurity,

G(t)=1k=1Cpkt2G(t) = 1 - \sum_{k=1}^C p_{k\mid t}^2

or entropy/information gain. Splits are greedy, maximizing Δ\DeltaGini or IG over features selected by EVO. The recommended hyperparameters, based on evaluation on cloud-IDS streams, are:

  • max_depth = 10 (to prevent overfitting),
  • min_samples_split = 0.05 × NN,
  • min_samples_leaf = 0.02 × NN,
  • no post-pruning (early stopping via above constraints).

This configuration balances tree expressivity with statistical robustness, particularly in high-dimensional settings after feature selection.

4. End-to-End D_TreeEVO Pipeline: Practical Realization

Pipeline steps are as follows:

  1. Data loading: Ingestion from public IDS datasets (CIC-DDoS2019, CSE-CIC-IDS2018, NSL-KDD), typically downsampling to manage class imbalance.
  2. Preprocessing: Removal of redundant information, imputation, encoding, balance correction, and scaling.
  3. Feature selection: EVO run with population size and iteration count set for dataset scale (e.g., 32 run ensembles in benchmarking).
  4. Model training: Stratified 80/20 train/test split; DT fitted on features selected by EVO.
  5. Evaluation protocol: Metrics computed per run—mean and standard deviation across 24–32 repeats (random seeds).

All steps are implemented in a unified workflow, achieving high computational efficiency and interpretability.

5. Empirical Benchmarking and Comparative Performance

D_TreeEVO demonstrates SOTA performance on benchmark IDS datasets when compared with baseline ML and metaheuristic combinations.

Dataset Model Accuracy F1-score # Features
CIC-DDoS2019 D_TreeEVO 99.13% 98.94% 38
CIC-DDoS2019 SVMEVO 95.60% 94.99% 38
CIC-DDoS2019 RFEVO 95.86% 95.34% 38
CSE-CIC-IDS2018 D_TreeEVO 99.78% 99.70% 43
CSE-CIC-IDS2018 SVMEVO 98.50% 98.51% 43

Confusion matrices show >99% correct classification for most classes, with rare misclassification (<0.3%). D_TreeEVO outperforms both deep learning and other hybrid methods on these benchmarks (Al-Husseini, 24 Jun 2025, Alhusseini et al., 3 Jan 2026).

6. Analysis of Trade-offs, Limitations, and Observed Behavior

  • EVO vs GWO: EVO achieves faster and more reliable convergence than Grey Wolf Optimizer (GWO) in IDS feature selection, due to its dynamic energy/barrier landscape and combined centroid/global/neighborhood updates (Al-Husseini, 24 Jun 2025).
  • Detection Rate vs False Alarm: D_TreeEVO yields slightly lower detection rates but significantly reduced false alarm rates compared to GWO-based approaches, a trade-off suitable for operational IDS contexts.
  • Computational overhead: EVO incurs additional metaheuristic search cost but leads to substantially reduced dimensionality, improving training/inference time post-selection.
  • Robustness: No formal significance tests are reported, but the consistency of gains across runs and low error rates are suggestive of true improvement.
  • Scalability: D_TreeEVO's ability to operate on large-scale, imbalanced datasets with high-dimensional feature spaces demonstrates practical viability.

7. Limitations, Open Questions, and Prospects

  • Hyperparameter tuning: EVO-specific meta-parameters (population size, decay coefficients, iteration count) must be tuned per dataset.
  • Generalization: The approach is sensitive to very low-frequency class instances; extending to ensembles or deep learners is posed as future work (Al-Husseini, 24 Jun 2025, Alhusseini et al., 3 Jan 2026).
  • Online adaptation: The current workflow does not support online/streaming feature re-selection; incremental model adaptation under concept drift is an open avenue.
  • Significance: While empirical differences are substantial, formal statistical validation (e.g., McNemar’s, paired tt-test) is not reported.

A plausible implication is that D_TreeEVO, by integrating a highly adaptive evolutionary feature selector with statistically robust tree learners, provides a flexible, interpretable, and performant solution for high-dimensional classification domains, especially in security-sensitive cloud environments. The demonstrated rapid convergence and consistent gains across multiple datasets position D_TreeEVO as a leading framework for hybrid metaheuristic-ML pipelines (Al-Husseini, 24 Jun 2025, Alhusseini et al., 3 Jan 2026).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to D_TreeEVO Model.