DNS-HyXNet: Lightweight DNS Tunnel Detector

Updated 17 December 2025

DNS-HyXNet is a lightweight, deployable model that detects DNS tunneling in real time using a two-layer xLSTM network and hybrid feature fusion.
It integrates lexical domain-name embeddings with normalized numeric DNS features, eliminating the need for graph-based preprocessing to achieve high accuracy.
The model outperforms traditional graph-based systems by reducing per-sample latency to 0.041 ms, enabling efficient radar-like threat detection on commodity hardware.

DNS-HyXNet is a lightweight, deployable sequential model for real-time detection of DNS tunneling, designed as an alternative to computationally expensive graph-based approaches. The architecture eliminates recursive graph construction and multi-stage classification in favor of an extended Long Short-Term Memory (xLSTM) network, which directly models temporal dependencies across DNS packet sequences. DNS-HyXNet integrates both lexical embeddings of domain names and normalized numerical DNS features, processing these through a two-layer xLSTM backbone to achieve efficient, single-stage multi-class tunnel detection at line speeds on commodity hardware (Ali et al., 10 Dec 2025).

1. Input Processing and Feature Fusion

DNS-HyXNet's input pipeline constructs a mixed representation for each DNS event by fusing standardized numeric features and hashed domain-name embeddings. The numeric feature vector $n_t \in \mathbb{R}^{d_n}$ includes quantities such as packet length (frame.len), DNS response TTL (dns.resp.ttl), arrival interval counters, and one-hot encodings for query types. These features are standardized per training split using $\hat n_t = (n_t - \mu) \oslash \sigma$ .

The lexical component processes each label $t_j$ of the queried FQDN using hashing: $bucket(t_j) = H(t_j) \bmod B$ with $B = 2^{15}$ , enabling efficient, collision-tolerant bucketing. The domain label sequence is left-padded to $T = 15$ tokens with a pad index of $0$. Embedded lookup is performed using a matrix $E \in \mathbb{R}^{B \times d_{emb}}$ where $d_{emb}=64$ , yielding domain embedding $e^{(lex)}_t$ . The mixed event representation is $\hat n_t = (n_t - \mu) \oslash \sigma$ 0, concatenating normalized numeric and embedding features.

2. xLSTM-Based Sequential Modeling

The core of DNS-HyXNet is a two-layer xLSTM, each with hidden dimension $\hat n_t = (n_t - \mu) \oslash \sigma$ 1, stacked in a unidirectional configuration. The xLSTM modifies standard LSTM forget gates by introducing exponential-forget gating via continuous decay:

At each time step $\hat n_t = (n_t - \mu) \oslash \sigma$ 2, the inputs are processed as

$\hat n_t = (n_t - \mu) \oslash \sigma$ 3

Gates are computed as $\hat n_t = (n_t - \mu) \oslash \sigma$ 4, $\hat n_t = (n_t - \mu) \oslash \sigma$ 5, $\hat n_t = (n_t - \mu) \oslash \sigma$ 6.
The decay parameter is $\hat n_t = (n_t - \mu) \oslash \sigma$ 7.
The cell and hidden states are updated by

$\hat n_t = (n_t - \mu) \oslash \sigma$ 8

No positional encoding or masking is introduced beyond padding of domain tokens. Temporal dependencies are learned implicitly via the recurrent structure over windows of $\hat n_t = (n_t - \mu) \oslash \sigma$ 9 events, with per-sample input tensor $t_j$ 0.

3. Classification Head and Deployment Efficiency

The final hidden state $t_j$ 1 from the top xLSTM layer is concatenated with the last standardized numeric feature vector $t_j$ 2, forming $t_j$ 3. Classification proceeds via a compact multilayer perceptron:

$t_j$ 4 Linear $t_j$ 5, ReLU, Dropout $t_j$ 6
$t_j$ 7 Linear $t_j$ 8, ReLU, Dropout $t_j$ 9
$bucket(t_j) = H(t_j) \bmod B$ 0 Linear $bucket(t_j) = H(t_j) \bmod B$ 1 logits $bucket(t_j) = H(t_j) \bmod B$ 2
$bucket(t_j) = H(t_j) \bmod B$ 3

DNS-HyXNet features approximately $bucket(t_j) = H(t_j) \bmod B$ 4 million parameters, sustained inference latency of $bucket(t_j) = H(t_j) \bmod B$ 5 ms per sample (approximately $bucket(t_j) = H(t_j) \bmod B$ 6 samples/s on commodity GPU), and peak memory usage under $bucket(t_j) = H(t_j) \bmod B$ 7GB. No graph state or additional preprocessing is required.

4. Training Procedures and Experimental Evaluation

Training utilizes the AdamW optimizer with a learning rate of $bucket(t_j) = H(t_j) \bmod B$ 8, weight decay $bucket(t_j) = H(t_j) \bmod B$ 9, and batch-level optimization strategies including mixed-precision, gradient clipping (1.0), ReduceLROnPlateau, and early stopping.

DNS-HyXNet was evaluated primarily on two benchmarks:

DNS-Tunnel-Datasets (Gao et al. 2024): $B = 2^{15}$ 0 benign, $B = 2^{15}$ 1 wildcard, $B = 2^{15}$ 2 tunneling families (e.g., dnscat2, iodine, dns2tcp, OzymanDNS, CobaltStrike).
CIC-Bell-DNS-EXF-2021: binary benign vs exfiltration.

Typical splits ranged from $B = 2^{15}$ 3– $B = 2^{15}$ 4 training, $B = 2^{15}$ 5 validation, remainder testing.

Performance metrics adhere to standard definitions for accuracy, classwise precision and recall, as well as macro-averaged $B = 2^{15}$ 6, $B = 2^{15}$ 7, $B = 2^{15}$ 8:

Dataset	Accuracy	$B = 2^{15}$ 9	$T = 15$ 0	$T = 15$ 1	Per-sample Latency
DNS-Tunnel-Datasets	99.99%	≥99.96%	≥99.96%	≥99.96%	0.041 ms
CIC-Bell-DNS-EXF-2021	99.71%	99.75%	99.63%	99.69%	0.041 ms

On the DNS-Tunnel-Datasets benchmark, per-class precision and recall both exceeded $T = 15$ 2. On CIC-Bell-DNS-EXF-2021, there were $T = 15$ 3 misclassifications out of $T = 15$ 4 samples.

5. Comparative Analysis: DNS-HyXNet vs. GraphTunnel

In direct comparison to GraphTunnel (Gao et al. 2024), which employs a GNN over recursive resolution graphs and a CNN tool identifier within a two-stage pipeline, DNS-HyXNet demonstrates superior operational efficiency.

GraphTunnel exhibits graph preprocessing latency of $T = 15$ 5 s versus DNS-HyXNet’s $T = 15$ 6 s.
Per-sample inference: $T = 15$ 7 ms (GraphTunnel) vs $T = 15$ 8 ms (DNS-HyXNet), a two-order magnitude improvement.
DNS-HyXNet’s memory footprint remains under $T = 15$ 9 GB and does not require maintaining separate graph states.
On wildcard family experiments, DNS-HyXNet achieves F1 of $0$0 compared to GraphTunnel’s $0$1.

DNS-HyXNet’s graph-free design enables sub-millisecond classification and deployability on edge hardware, consolidating tunnel family discrimination into a single end-to-end pass.

6. Ablation Observations and Model Robustness

The source manuscript does not present explicit ablation studies isolating the contribution of lexical embeddings versus numeric features, or varying xLSTM depth (e.g., $0$2 vs $0$3), dropout, or embedding size. Authors attribute robustness primarily to the exponential-forget gating mechanism and hybrid fusion of feature types. Detailed quantitative ablation tables were not provided. A plausible implication is that the hybrid approach—juxtaposing domain-name tokenization and statistical DNS attributes—underpins the model’s generalization across diverse family types.

7. Technical Significance and Deployment Considerations

DNS-HyXNet demonstrates that a lightweight two-layer xLSTM processing mixed hashed domain-label embeddings and normalized numeric DNS features can achieve parity with, or exceed, the accuracy of complex graph-based detectors while operating at line speed on commodity hardware. The design is compact, requiring no recursive parsing or graph construction, and is suitable for real-time threat detection scenarios. This suggests sequential modeling with xLSTM is a viable replacement for traditional recursive-graph approaches in DNS tunnel detection, enabling deployment without specialized infrastructure (Ali et al., 10 Dec 2025).

Markdown Report Issue Upgrade to Chat

References (1)

From Graphs to Gates: DNS-HyXNet, A Lightweight and Deployable Sequential Model for Real-Time DNS Tunnel Detection (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to DNS-HyXNet.