Papers
Topics
Authors
Recent
Search
2000 character limit reached

DNS-HyXNet: Lightweight DNS Tunnel Detector

Updated 17 December 2025
  • DNS-HyXNet is a lightweight, deployable model that detects DNS tunneling in real time using a two-layer xLSTM network and hybrid feature fusion.
  • It integrates lexical domain-name embeddings with normalized numeric DNS features, eliminating the need for graph-based preprocessing to achieve high accuracy.
  • The model outperforms traditional graph-based systems by reducing per-sample latency to 0.041 ms, enabling efficient radar-like threat detection on commodity hardware.

DNS-HyXNet is a lightweight, deployable sequential model for real-time detection of DNS tunneling, designed as an alternative to computationally expensive graph-based approaches. The architecture eliminates recursive graph construction and multi-stage classification in favor of an extended Long Short-Term Memory (xLSTM) network, which directly models temporal dependencies across DNS packet sequences. DNS-HyXNet integrates both lexical embeddings of domain names and normalized numerical DNS features, processing these through a two-layer xLSTM backbone to achieve efficient, single-stage multi-class tunnel detection at line speeds on commodity hardware (Ali et al., 10 Dec 2025).

1. Input Processing and Feature Fusion

DNS-HyXNet's input pipeline constructs a mixed representation for each DNS event by fusing standardized numeric features and hashed domain-name embeddings. The numeric feature vector ntRdnn_t \in \mathbb{R}^{d_n} includes quantities such as packet length (frame.len), DNS response TTL (dns.resp.ttl), arrival interval counters, and one-hot encodings for query types. These features are standardized per training split using n^t=(ntμ)σ\hat n_t = (n_t - \mu) \oslash \sigma.

The lexical component processes each label tjt_j of the queried FQDN using hashing: bucket(tj)=H(tj)modBbucket(t_j) = H(t_j) \bmod B with B=215B = 2^{15}, enabling efficient, collision-tolerant bucketing. The domain label sequence is left-padded to T=15T = 15 tokens with a pad index of $0$. Embedded lookup is performed using a matrix ERB×dembE \in \mathbb{R}^{B \times d_{emb}} where demb=64d_{emb}=64, yielding domain embedding et(lex)e^{(lex)}_t. The mixed event representation is et=[n^tet(lex)]e_t = [\hat n_t \mathbin\Vert e^{(lex)}_t], concatenating normalized numeric and embedding features.

2. xLSTM-Based Sequential Modeling

The core of DNS-HyXNet is a two-layer xLSTM, each with hidden dimension h=128h=128, stacked in a unidirectional configuration. The xLSTM modifies standard LSTM forget gates by introducing exponential-forget gating via continuous decay:

  • At each time step tt, the inputs are processed as

[i~t,f~t,o~t,g~t]=Wxxt+Whht1+bR4h[\tilde{i}_t, \tilde{f}_t, \tilde{o}_t, \tilde{g}_t] = W_x x_t + W_h h_{t-1} + b \in \mathbb{R}^{4h}

  • Gates are computed as it=σ(i~t)i_t = \sigma(\tilde{i}_t), ot=σ(o~t)o_t = \sigma(\tilde{o}_t), gt=tanh(g~t)g_t = \tanh(\tilde{g}_t).
  • The decay parameter is αt=exp(softplus(f~t))(0,1]\alpha_t = \exp(-\text{softplus}(\tilde{f}_t)) \in (0,1].
  • The cell and hidden states are updated by

ct=αtct1+itgt ht=ottanh(ct)c_t = \alpha_t \odot c_{t-1} + i_t \odot g_t\ h_t = o_t \odot \tanh(c_t)

No positional encoding or masking is introduced beyond padding of domain tokens. Temporal dependencies are learned implicitly via the recurrent structure over windows of T=15T=15 events, with per-sample input tensor XRT×deX \in \mathbb{R}^{T \times d_e}.

3. Classification Head and Deployment Efficiency

The final hidden state hTh_T from the top xLSTM layer is concatenated with the last standardized numeric feature vector n^T\hat n_T, forming z=[n^ThT]Rdn+128z = [\hat n_T \mathbin\Vert h_T] \in \mathbb{R}^{d_n+128}. Classification proceeds via a compact multilayer perceptron:

  • zz \to Linear(dn+128,256)(d_n+128, 256), ReLU, Dropout(0.2)u(0.2) \to u
  • uu \to Linear(256,128)(256, 128), ReLU, Dropout(0.2)v(0.2) \to v
  • vv \to Linear(128,K)(128, K) \to logits \ell
  • y^=softmax()ΔK1\hat{y} = \text{softmax}(\ell) \in \Delta^{K-1}

DNS-HyXNet features approximately $2.4$ million parameters, sustained inference latency of $0.041$ ms per sample (approximately $24,000$ samples/s on commodity GPU), and peak memory usage under $1$GB. No graph state or additional preprocessing is required.

4. Training Procedures and Experimental Evaluation

Training utilizes the AdamW optimizer with a learning rate of 2×1032 \times 10^{-3}, weight decay 10410^{-4}, and batch-level optimization strategies including mixed-precision, gradient clipping (1.0), ReduceLROnPlateau, and early stopping.

DNS-HyXNet was evaluated primarily on two benchmarks:

  • DNS-Tunnel-Datasets (Gao et al. 2024): K=1K=1 benign, $1$ wildcard, $11$ tunneling families (e.g., dnscat2, iodine, dns2tcp, OzymanDNS, CobaltStrike).
  • CIC-Bell-DNS-EXF-2021: binary benign vs exfiltration.

Typical splits ranged from $60$–70%70\% training, 20%20\% validation, remainder testing.

Performance metrics adhere to standard definitions for accuracy, classwise precision and recall, as well as macro-averaged PmacroP_{macro}, RmacroR_{macro}, F1macroF1_{macro}:

Dataset Accuracy PmacroP_{macro} RmacroR_{macro} F1macroF1_{macro} Per-sample Latency
DNS-Tunnel-Datasets 99.99% ≥99.96% ≥99.96% ≥99.96% 0.041 ms
CIC-Bell-DNS-EXF-2021 99.71% 99.75% 99.63% 99.69% 0.041 ms

On the DNS-Tunnel-Datasets benchmark, per-class precision and recall both exceeded $0.999$. On CIC-Bell-DNS-EXF-2021, there were $61$ misclassifications out of $20,977$ samples.

5. Comparative Analysis: DNS-HyXNet vs. GraphTunnel

In direct comparison to GraphTunnel (Gao et al. 2024), which employs a GNN over recursive resolution graphs and a CNN tool identifier within a two-stage pipeline, DNS-HyXNet demonstrates superior operational efficiency.

  • GraphTunnel exhibits graph preprocessing latency of $48.7$ s versus DNS-HyXNet’s $34.2$ s.
  • Per-sample inference: $4.46$ ms (GraphTunnel) vs $0.041$ ms (DNS-HyXNet), a two-order magnitude improvement.
  • DNS-HyXNet’s memory footprint remains under $1$ GB and does not require maintaining separate graph states.
  • On wildcard family experiments, DNS-HyXNet achieves F1 of $0.9999$ compared to GraphTunnel’s $0.9978$.

DNS-HyXNet’s graph-free design enables sub-millisecond classification and deployability on edge hardware, consolidating tunnel family discrimination into a single end-to-end pass.

6. Ablation Observations and Model Robustness

The source manuscript does not present explicit ablation studies isolating the contribution of lexical embeddings versus numeric features, or varying xLSTM depth (e.g., L=1L=1 vs L=2L=2), dropout, or embedding size. Authors attribute robustness primarily to the exponential-forget gating mechanism and hybrid fusion of feature types. Detailed quantitative ablation tables were not provided. A plausible implication is that the hybrid approach—juxtaposing domain-name tokenization and statistical DNS attributes—underpins the model’s generalization across diverse family types.

7. Technical Significance and Deployment Considerations

DNS-HyXNet demonstrates that a lightweight two-layer xLSTM processing mixed hashed domain-label embeddings and normalized numeric DNS features can achieve parity with, or exceed, the accuracy of complex graph-based detectors while operating at line speed on commodity hardware. The design is compact, requiring no recursive parsing or graph construction, and is suitable for real-time threat detection scenarios. This suggests sequential modeling with xLSTM is a viable replacement for traditional recursive-graph approaches in DNS tunnel detection, enabling deployment without specialized infrastructure (Ali et al., 10 Dec 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to DNS-HyXNet.