DNS-Tunnel Datasets for Detection Research
- DNS-Tunnel-Datasets are structured collections of labeled DNS traffic, capturing both benign and diverse tunneling tool behaviors for covert data exfiltration analysis.
- They offer raw packet logs and feature-engineered representations, including domain embeddings and numeric DNS features, to support various detection methodologies.
- Benchmark results demonstrate near-perfect accuracy and low false positive rates, validating their value for real-time tunnel detection and protocol inference research.
The DNS-Tunnel-Datasets are a family of labeled, structured datasets specifically developed to support research in DNS tunneling analysis and detection. DNS tunneling encodes arbitrary data within DNS queries and responses, facilitating covert data exfiltration, command-and-control channels, and protocol bridging. DNS-Tunnel-Datasets systematically capture the multi-class and multi-tool diversity of tunneled and benign DNS traffic, providing standardized resources for reproducible protocol inference, anomaly detection, and machine learning evaluation on DNS tunnel scenarios (Ali et al., 10 Dec 2025).
1. Dataset Origins and Composition
DNS-Tunnel-Datasets consolidate several domain-specific corpora, each targeting distinct aspects of DNS tunneling and its detection in both lab-controlled and wild settings.
a. Gao et al. DNS-Tunnel-Datasets (2024):
- The primary dataset, curated by Gao et al., comprises raw DNS query and response packet logs across twelve classes:
- Benign (normal recursive queries)
- Wildcard DNS responses (authoritative wildcard behaviors)
- Eleven tunneling tool families, including tcp-over-dns, dnscat2, andiodine, dns2tcp, iodine, dnspot, dns-shell, tuns, CobaltStrike, and OzymanDNS
- Approximate class proportions: benign (58%), wildcard (4%), each tunneling tool (2–4%) (Ali et al., 10 Dec 2025)
- Data splits: 60% train, 20% validation, 20% test; additional reduced-data splits as low as 10% labeled training data
b. Additional Datasets in the Ecosystem:
- Homem et al. (2017): DNS tunnel capture (Iodine tool, HTTP/FTP over DNS, 20 tunnel sessions, 2 plain sessions) for protocol inference (Homem et al., 2017)
- Palau et al. DNSTunnelData (2020): VM-based, 5 tunneling tools, ≈8000 tunneling and ≈8500 benign domains, HDI/CSV format (Palau et al., 2020)
- Meyer et al. (2019): Passive DNS (pDNS) measurement of tunnel domains in the wild, 273 suspicious tunnel SLDs extracted from 2B+ new FQDNs, with meta-features and clustering (Tatang et al., 2019)
All datasets are publicly downloadable for research under open licenses, e.g., https://github.com/ggyggy666/DNS-Tunnel-Datasets and https://github.com/PalauLab/DNSTunnelData (Ali et al., 10 Dec 2025, Palau et al., 2020).
2. Labeling, Class Taxonomy, and Ground Truth
The core DNS-Tunnel-Datasets employ a multi-class taxonomy:
| Class | Description | Example Tools |
|---|---|---|
| benign | Legitimate recursive DNS traffic | n/a |
| wildcard | Responses from wildcard records | n/a |
| tcp-over-dns | TCP tunneled in DNS queries | tcp-over-dns |
| dnscat2 | C2 and bidirectional comms over DNS | dnscat2 |
| andiodine | Variant of Iodine | andiodine |
| dns2tcp | TCP-over-DNS tunneling | dns2tcp |
| iodine | IPv4-over-DNS | iodine |
| dnspot | DNSSpot tunnel traffic | dnspot |
| dns-shell | Shell via DNS | dns-shell |
| tuns | IPv4-through-DNS tool | tuns |
| CobaltStrike | C2/RedTeaming framework utilizing DNS | CobaltStrike |
| OzymanDNS | Exfiltration tool over DNS | OzymanDNS |
Ground truth is established by controlled tool execution or rigorous pDNS filtering, often referencing synchronized script logs (“time-injection” alignment) or explicit domain whitelists/blacklists (Ali et al., 10 Dec 2025, Palau et al., 2020, Tatang et al., 2019).
The Palau et al. dataset encodes ground-truth with the fields label (0=normal, 1=tunneling) and tool_code (tool identifier) (Palau et al., 2020).
3. Feature Representation and Preprocessing
DNS-Tunnel-Datasets provide both raw and feature-engineered fields capable of supporting symbolic, statistical, or deep sequence models:
- Domain Embeddings: Fully qualified domain names (FQDNs) are tokenized at each label (by “.”), each mapped by a hash function to a fixed bucket count (B = 2¹⁵ = 32,768), and then embedded (64-dimensional dense vector per token). Sequences are padded or truncated to T=15 tokens to ensure uniformity (Ali et al., 10 Dec 2025).
- Numeric DNS Features: Standardized (z-scored) features from the DNS packet: frame length, response TTL, query/response counts per window, inter-arrival time Δt, and Shannon entropy of the subdomain. These accommodate statistical modeling or serve as direct input to hybrid neural architectures (Ali et al., 10 Dec 2025).
- Raw and CSV/JSON Schemas: Per-packet fields include timestamp, src_ip, dst_ip, protocol_label, tunnel_flag, packet_index, byte_entropy, and tool-specific features (see per-dataset schema for full details) (Homem et al., 2017, Palau et al., 2020, Tatang et al., 2019).
- No Pre-computed Lexical Stats: In Palau et al., only raw QNAMEs are distributed, leaving n-gram, entropy, and lexical ratio feature extraction to downstream users (Palau et al., 2020).
4. Detection Methodologies and Benchmarking
DNS-Tunnel-Datasets underlie several detection paradigms:
- Sequence Models: DNS-HyXNet applies a hybrid xLSTM architecture, ingesting concatenated [numeric, embedding] per-packet vectors, learning multi-class discrimination within a two-layer xLSTM stack. After T steps, the last hidden state (
h_T) and numeric vector (\hat n_T) are conjoined and classified by a 3-layer MLP; softmax outputs multinomial tunnel class probabilities (Ali et al., 10 Dec 2025). - Entropy-Based Classification: Homem et al. extract per-packet Shannon entropy () and use mean-difference classification. For each session, the mean entropy is compared to reference means, and the protocol label with minimal absolute distance (“MeanDiff”) is chosen. Reported accuracy is 75% (15/20 sessions), with protocol-specific confusion metrics (Homem et al., 2017).
- CNN-based Lexicographical Detection: Palau et al. model QNAME strings as integer sequences passed to a 1D ConvNet with filters trained for 4-gram local patterns, achieving detection F1 ≈ 0.96 with a 0.8% false positive rate (Palau et al., 2020).
- Step-wise Filtering and Profiling: Meyer et al. identify tunnel domains from 2B+ pDNS queries via a filter cascade (rrtype, whitelist, FQDN level, fan-out, domain profiling) and manually tag resulting domains as Service/Organization/Private/Other, mapping to tool families by structural fingerprint (Tatang et al., 2019).
5. Composition, Distribution, and Access
The principal DNS-Tunnel-Datasets and related corpora provide structured access and transparent distributional balance:
| Dataset | Tool Families | Benign Samples | Tunnel Samples | Classes | Main Format | Access URL |
|---|---|---|---|---|---|---|
| DNS-Tunnel-Datasets (Gao) | 11 (+ benign, wildc) | 58% | 38% | 12 | PCAP/table | https://github.com/ggyggy666/DNS-Tunnel-Datasets |
| DNSTunnelData (Palau) | 5 | 8,511 | 8,000 | 2+tool | CSV | https://github.com/PalauLab/DNSTunnelData |
| Homem et al. (2017) | 1 (Iodine) + plain | n/a | 20 sessions | 2+ref | PCAP/CSV | Code in TunnelStatsTests GitHub |
| Meyer et al. (2019) | pDNS wild, profiled | n/a | 273 SLDs | n/a | CSV/JSON | https://www.rub.de/dns-tunnels-dataset |
Datasets handle privacy by full session anonymization or by releasing only summary/statistical features. Licensing varies (MIT, CC BY-NC 4.0, CC-BY-4.0), allowing broad academic reuse.
6. Benchmark Performance and Limitations
Benchmarking on DNS-Tunnel-Datasets enables rigorous evaluation of detection pipelines:
- DNS-HyXNet Results (Ali et al., 10 Dec 2025):
- Macro-averaged accuracy: 99.99%
- Macro precision/recall/F1: ≥ 99.96%
- Average detection time: 0.041 ms/sample, throughput ≈24,215 samples/s
- Per-class F1-scores all >0.9997, indicating near-perfect inter-class separation
- Homem et al. MeanDiff Classifier (Homem et al., 2017):
- 75% tunnel protocol inference accuracy using entropy histograms
- False positive/recall/precision per protocol are explicitly reported
- Palau et al. CNN Lexicographical Classifier (Palau et al., 2020):
- Overall balanced F1 ≈ 0.9605, tunneling recall 92.4%, normal recall 99.5%
- “More than 92% of total tunneling domains detected with a false positive rate close to 0.8%”
- Meyer et al. Wild Dataset (Tatang et al., 2019):
- 273 suspicious tunnel SLDs discovered, “iodine-like” accounting for ≈80% of labeled tunnel behavior
All results are reported verbatim from the cited sources, with reproducible splits and confusion matrices.
7. Applicability, Extensions, and Community Impact
DNS-Tunnel-Datasets serve as benchmarks for:
- Real-time tunnel detection under adversarial, open-world conditions
- Multi-family and multi-protocol evaluation, including emerging tool variants (e.g., CobaltStrike, dnspot)
- Research on privacy-preserving forensic analysis, entropy/probabilistic features, protocol fingerprinting, and sequence-based classification
- Standardization of evaluation for new detection algorithms, model robustness, and transfer learning in DNS anomaly detection
The broad spectrum of DNS-Tunnel-Datasets supports not only protocol-level inference but also the development of lightweight, low-latency real-time detection systems for deployment on commodity infrastructure (Ali et al., 10 Dec 2025). The availability of both raw and derived feature corpora, multiple types of tunnel traffic, and open metrics establishes these datasets as central resources for DNS security research and engineering.