Multi-Stage IDS for Connected Autonomous Vehicles

Updated 6 January 2026

Multi-stage IDS for CAVs is a layered security approach that integrates supervised, unsupervised, and federated learning techniques to detect both known and zero-day cyber-attacks in vehicular networks.
These systems employ sequential detection mechanisms using deep learning, GANs, clustering, and statistical filtering to achieve high accuracy and low false-alarm rates in real-time settings.
They leverage advanced preprocessing, feature engineering, and model compression to meet automotive hardware constraints while ensuring robust, adaptive intrusion detection.

A multi-stage Intrusion Detection System (IDS) for Connected and Autonomous Vehicles (CAVs) systematically combines complementary detection mechanisms—typically integrating supervised and unsupervised learning and, in advanced designs, federated or resource-aware model management—to identify a wide range of cyber-attacks on in-vehicle and vehicular networks. These systems are architected to address the spectrum of threat exposure, from known, previously observed attack signatures to zero-day manipulations affecting safety-critical automotive data streams.

1. Multi-Stage IDS Architectures: Principles and Variants

Multi-stage IDS for CAVs are designed as pipelines that enable granular, layered analysis on data flowing through vehicular networks. Each stage typically serves a distinct detection or filtering function, attacking the dual requirements of high-accuracy known-attack detection (signature-based, supervised learning) and robust discovery of novel, zero-day attacks (anomaly-based, unsupervised or semi-supervised learning). Canonical architectures include:

Two-stage deep-learning IDS: First stage detects previously seen attacks with a supervised Artificial Neural Network (ANN) classifier, followed by a Long Short-Term Memory (LSTM) autoencoder that flags unknown/novel patterns based on reconstruction error (Althunayyan et al., 2024).
Hybrid GAN–sequence classifier pipelines: An unsupervised GAN (BiGAN with WGAN-GP objective) provides coarse anomaly filtering; only these flagged windows are subjected to fine-grained multi-class CNN-LSTM classification. Compression and quantization techniques enable deployment on embedded automotive hardware (S et al., 30 Dec 2025).
Feature-engineered multi-tier hybrids: Classical pipelines deploy tree-ensemble classifiers for signature detection and anomaly-based clustering (mini-batch k-means) for zero-day defense; false-positive/false-negative correction can be layered with biased classifier postprocessing (Yang et al., 2021).
Sensor-level multi-stage AD: Time-series anomaly detection combines amplification blocks, omni-scale CNNs that exhaustively search all kernel sizes, and a Kalman filter to localize compromised sensors—enabling both attack detection and forensic sensor identification (Rahman et al., 2024).

Major assemblies may also be enhanced by privacy-preserving, federated training (e.g., hierarchical federated learning—H-FL) to support continual, fleet-wide adaptation without raw data aggregation (Althunayyan et al., 2024).

2. Data Processing, Feature Engineering, and Preprocessing

Effective multi-stage IDS depend on judicious feature selection and normalization processes due to the diversity and noise characteristics of automotive data:

Bus-level features: CAN-bus messages are interpreted as fixed-dimensional vectors (e.g., [CAN ID, D0–D7]), with payload bytes (0–255) label-encoded and scaled (standardization: $X_{scaled} = \frac{X - \mu}{\sigma}$ ) (Althunayyan et al., 2024).
Physical-signal sequences: CAV-AD processes time-series sensor windows, e.g., speed, position, or acceleration; an “amplification block” increases anomaly contrast for small windowed deviations (Rahman et al., 2024).
High-dimensional engineered features: External and V2X data streams are summarized into flow features or principal components (e.g., via Information Gain filtering, FCBF, kernel PCA) for tree-based hybrid systems (Yang et al., 2021).
Motion/state context: Sequence tensors may encode motion states (position, speed, acceleration, heading) for windowed anomaly assessment (S et al., 30 Dec 2025).

Preprocessing steps can include outlier removal, SMOTE oversampling for minority classes, and, for ensemble systems, one-hot encoding and feature scaling to [0,1] ranges. Amplification functions (e.g., $a_t' = a_t + \beta$ if $|a_{t+1} - a_t| > \alpha$ ) are employed to elevate detection sensitivity to minimal-magnitude attacks (Rahman et al., 2024).

3. Stage-wise Detection Mechanisms

The detection stages are tightly specialized:

Supervised signature/attack recognition: Fast, shallow networks (ANNs with $9 \rightarrow 16 \rightarrow 16 \rightarrow 5$ topology) classify “Normal” vs. “DoS”, “Fuzzing”, “RPM spoof”, “Gear spoof” attacks. Categorical cross-entropy loss and Adam optimizer are standard; decision threshold is $\hat{y} = \arg\max_{i} \text{softmax}_i(x)$ (Althunayyan et al., 2024). Ensemble models stack tree learners (DT, RF, ET, XGB) with meta-learners tuned via Bayesian optimization (Yang et al., 2021).
Unsupervised anomaly detectors: LSTM-autoencoders ( $\sim$ 250k parameters) minimize $L_{rec} = \| x - \hat{x} \|^2$ ; anomalies are flagged if $s(x) > \theta$ , where $\theta$ is mean plus one standard deviation of training-set errors (Althunayyan et al., 2024). GAN reconstruction combines MSE and Mahalanobis distance for robust detection; thresholds derived from training error IQRs (S et al., 30 Dec 2025).
Fine-grained multi-classification: CNN-LSTM models operate on flagged temporal windows ( $w=20$ ), using label-smoothing or focal loss to enhance class separation (S et al., 30 Dec 2025).
Sensor anomaly localization: Kalman filters predict each sensor’s expected reading, flagging when $|z_k - \hat{x}_{k|k-1}| > T$ with $T$ set to $2 \times$ training residual standard deviation (Rahman et al., 2024).

Table 1: Illustrative Stage Configuration (select systems)

IDS Scheme	Stage 1	Stage 2	Extra Stage(s)
(Althunayyan et al., 2024)	ANN (supervised)	LSTM-AE (unsup.)	H-FL for update, privacy
(S et al., 30 Dec 2025)	BiGAN/WGAN-GP	CNN-LSTM	Model compression
(Yang et al., 2021)	Tree-ensemble	CL-k-means + cls	Feature eng., stacking
(Rahman et al., 2024)	Amplify block	O-OS-CNN	Kalman filter

4. Federated Learning, Compression, and Deployment Strategies

Meeting embedded, real-time automotive constraints necessitates system-level optimization:

Hierarchical Federated Learning (H-FL): Deploys a three-tier model (vehicle endpoints, edge clusters by CAN profile, central cloud aggregation). Performs federated averaging at edge and global levels: $w_e = \sum_{v\in group_e} \frac{n_v}{N_e}w_v$ , $w_{t+1} = \sum_{e=1}^E \frac{N_e}{N}w_e$ (Althunayyan et al., 2024). Privacy is maintained by retaining all raw CAN data on vehicles; model updates may incorporate secure aggregation, differential privacy, or homomorphic encryption.
Model compression: Structured pruning (prune $\sim$ 40% of lowest-L1-norm conv filters per layer) plus static quantization (8-bit), with calibration via training statistics, yield $>77\%$ model size reduction and $\sim$ 50\% decrease in inference time with minimal accuracy loss, e.g., $\sim$ 5% (S et al., 30 Dec 2025).
Real-time inference: Memory/load metrics confirm practical deployment—e.g., entire two-stage DL model fits $\sim$ 3 MB flash and tens of MB RAM, with end-to-end per-message latency $1$–$5$ ms, meeting in-vehicle deadlines (Althunayyan et al., 2024). Compressed GAN-LSTM achieves per-vehicle processing in $0.195$ s on Jetson Nano-class ECUs (S et al., 30 Dec 2025).

5. Performance Evaluation and Metrics

Rigorous experimental evaluations utilize real-world and simulated datasets:

Seen attacks (supervised stage):
- F1-scores $\geq 0.99$ , DR (Detection Rate) approaching $100\%$ , and FAR (False Alarm Rate) $\leq 0.012\%$ on benchmarks such as Car-Hacking and VeReMi (Althunayyan et al., 2024, Ahsan et al., 2023).
Unseen/zero-day attacks (unsupervised/anomaly-based stage):
- F1-scores of $0.95$ for LSTM-AE hybrid; DR $99.99\%$ and FAR $0.016\%$ (Althunayyan et al., 2024).
- CNN-LSTM (after GAN filtering) reaches $97.88\%$ accuracy, F1 $78.21\%$ ; stage-wise hybrid achieves $85.05\%$ accuracy for zero-day cases (S et al., 30 Dec 2025).
- Multi-stage CAV-AD yields $97$– $99.5\%$ accuracy, sensor-level F1 up to $95.6\%$ (constant) and $88.7\%$ (instant) (Rahman et al., 2024).
Baseline comparison:
- Multi-stage DL IDSs achieve comparable or lower FAR/DR than classical hybrid MTH-IDS (F1 $0.96$ for zero-day, but MTH-IDS requires biased classifier correction) (Althunayyan et al., 2024, Yang et al., 2021).
Latency/footprint:
- All advanced models meet sub-millisecond ( $<1$ ms) or millisecond-scale processing per packet/message, with on-device model footprints tailored for hardware with hundreds of MB memory (Yang et al., 2021, S et al., 30 Dec 2025).

6. Explainability, Robustness, and Interpretation

Multi-stage architectures can incorporate explainable AI mechanisms and external robustness checks to aid in operator trust and forensic action:

Explainable ensembles: RF/XGB base learners’ output is fused via logistic regression (LR), with SHAP (SHapley Additive exPlanations) applied to yield per-feature contribution and meta-learner weight rationales. Key features (e.g., “age”, “speed”, “recvDelay”) dominate attack prediction (Ahsan et al., 2023). LR meta-weights quantify the reliability of each base model under varying conditions.
Robust detection: Omni-scale CNN architectures in CAV-AD, sweeping all kernel window sizes, guarantee anomalous patterns of arbitrary time scale are detected. Kalman filter integration yields sensor-level attribution, outperforming GMM baselines by large margins ( $\geq$ 99% correct sensor identification under instant attacks) (Rahman et al., 2024).
Adaptation and update: Systems utilizing federated or online-learning frameworks can promptly absorb new attack labels or concept drift, propagating adaptations across the vehicular fleet (Althunayyan et al., 2024).

7. Challenges, Trade-offs, and Future Directions

Remaining challenges and design trade-offs for multi-stage IDS in CAVs include:

Resource vs. accuracy: Model compression and quantization significantly reduce runtime and memory cost (e.g., $77\%$ compression at only $5\%$ accuracy penalty (S et al., 30 Dec 2025)).
Threshold and hyperparameter tuning: Detection thresholds (e.g., anomaly score $\theta$ , amplification factor $\beta$ ) can be dataset- and context-specific, requiring recalibration for new fleets or environments (Rahman et al., 2024).
Multi-sensor and coordinated attacks: Most current designs (e.g., CAV-AD) assume at most one compromised sensor at a time; robust detection of correlated, multi-point attacks is an open direction (Rahman et al., 2024).
Computational scaling: Omni-scale kernel sweeps can be costly; future neural architecture search and pruning techniques may approximate their detection efficacy with lower hardware demand.

A plausible implication is that further integration of continual learning, cross-vehicle federated adaptation, and advanced explainability tooling is necessary for deployment of fully adaptive and auditable IDS ecosystems in production CAV fleets.

References

"A Robust Multi-Stage Intrusion Detection System for In-Vehicle Network Security using Hierarchical Federated Learning" (Althunayyan et al., 2024)
"FAST-IDS: A Fast Two-Stage Intrusion Detection System with Hybrid Compression for Real-Time Threat Detection in Connected and Autonomous Vehicles" (S et al., 30 Dec 2025)
"MTH-IDS: A Multi-Tiered Hybrid Intrusion Detection System for Internet of Vehicles" (Yang et al., 2021)
"CAV-AD: A Robust Framework for Detection of Anomalous Data and Malicious Sensors in CAV Networks" (Rahman et al., 2024)
"An Explainable Ensemble-based Intrusion Detection System for Software-Defined Vehicle Ad-hoc Networks" (Ahsan et al., 2023)