Adversarial Malware Binaries

Updated 30 January 2026

Adversarial malware binaries are malicious executables that undergo minimal, functionality-preserving modifications to evade ML-based detectors.
They exploit both binary feature flips and raw-byte attacks using gradient-based, surrogate, and black-box methods to bypass sophisticated defenses.
Empirical studies reveal that constrained perturbations can achieve high evasion rates despite improvements in ensemble and certified defense techniques.

Adversarial malware binaries are malicious executables that have undergone carefully designed, functionality-preserving modifications in order to evade detection by ML-based malware detectors. Unlike adversarial examples in image domains, where small perturbations are measured by continuous norms such as L₂, adversarial malware binaries must comply with discrete and semantic constraints: the resulting binary must remain a valid, fully functional program while changing as little as possible in its representation to flip the detector's prediction. Research demonstrates that both raw-byte and feature-based ML detectors—including those equipped with advanced defenses—are highly susceptible to such attacks, particularly when the adversary leverages domain-specific knowledge of the executable format and detection model vulnerabilities (Jafari et al., 14 May 2025).

1. Formal Problem Definition and Attack Models

Adversarial malware binary generation is defined as a constrained optimization problem. For a given malicious executable $x$ (often expressed either as a raw byte vector $x \in \{0,\ldots,255\}^n$ or as a binary feature vector $x \in \{0,1\}^d$ ), the adversary seeks a perturbed version $x'$ such that:

$x'$ is functionally equivalent to $x$ (i.e., preserves malicious behavior and passes semantic and syntactic constraints of the file format),
the detector $f$ misclassifies $x'$ as benign: $f(x') < \tau$ for some threshold $\tau$ ,
the perturbation is minimal in some norm or measure (e.g., Hamming or Levenshtein distance, number of feature/byte edits).

In binary feature domains, the attack is commonly expressed as: $\min_{\delta \in \{-1,0,1\}^d}\|\delta\|_0 \quad \text{s.t.} \quad x' = x+\delta \in \{0,1\}^d,~~f(x') < \tau,~~x' \in \mathcal{F}_{\text{valid}}$ where $\mathcal{F}_{\text{valid}}$ encodes domain-specific constraints (e.g., manifest consistency in APKs, PE structural integrity) (Jafari et al., 14 May 2025, Suciu et al., 2018, Kolosnjaji et al., 2018).

For raw-byte attacks, constraints typically include executable validity (e.g., preserving the PE header, not altering control flow) and may limit perturbations to append-only payloads or modification of “slack” regions (Suciu et al., 2018, Kolosnjaji et al., 2018).

2. Attack Methodologies: Discrete, Feature, and Byte-Space Techniques

Binary Feature-Space Attacks

Gradient-based attacks in binary domains are challenging due to non-differentiability. Surrogate relaxations optimize over $[0,1]^d$ using smooth proxies for Hamming distance and iteratively round solutions back to discrete values (Jafari et al., 14 May 2025). Techniques include:

Sigma-binary attack: Optimizes a relaxed objective combining detector loss and a smooth Hamming penalty, with adaptive sparsification to produce highly sparse, effective perturbations.
Prioritized Binary Rounding (PBR): Orders features by perturbation magnitude and gradient importance, flipping bits greedily until evasion is achieved (Jafari et al., 14 May 2025).
FGSM and coordinate ascent: Deterministic and randomized rounding after signed or saliency-driven bit flips (Al-Dujaili et al., 2018, Podschwadt et al., 2019).

In dynamic analysis, adversaries manipulate the sparse binary representation of runtime logs (e.g., API calls, strings) by enabling benign or disabling malicious features (Stokes et al., 2017).

Raw-Byte Attacks

For detectors operating on raw binary input (e.g., MalConv), attacks exploit architectural weaknesses:

Gradient-based byte append/injection: Iteratively updates an appended payload by descending the gradient of the detector loss w.r.t. byte embeddings, choosing the optimal byte values at each position (Kolosnjaji et al., 2018, Kreuk et al., 2018, Demetrio et al., 2019).
Slack/overlay modification: Perturbation limited to slack bytes or appended overlay regions which do not affect runtime semantics (Suciu et al., 2018).
Header modification: Attribute-based attacks target high-saliency, semantically irrelevant header fields to minimize the number of byte flips (Demetrio et al., 2019, Dasgupta et al., 2021).
Genetic and RL-based attacks: Black-box search methods (e.g., genetic algorithms, RL policy optimization) sequence functionality-preserving mutation actions (section injection, benign padding, header tweak) to evade detectors, often optimizing the number of queries and perturbation magnitude (Rigaki et al., 2023, Ebrahimi et al., 2020, Wang et al., 2020, Dasgupta et al., 2021).

Domain-Specific and Advanced Attacks

Instrumentation-based adversarial attacks for WebAssembly binaries inject dead-code gadgets (SE/OR) which carry adversarial payload bytes, guided by substitute models and gradient-based optimization on downsampled representations (Loose et al., 2023).
Obfuscation alignment via semantic-nop insertion creates executable variants nearly indistinguishable in image representation, providing high transferability, including to models using grayscale image malware representations (Park et al., 2019).

3. Structural and Empirical Analysis of Detector Vulnerabilities

Extensive studies confirm that both raw-byte and binary feature-based detectors exhibit brittle and highly exploitable decision boundaries:

Detectors relying on raw-byte CNNs (e.g., MalConv) are vulnerable to small, well-placed modifications in highly weighted header regions or aligned appended payloads that manipulate temporal max-pool activation (Kreuk et al., 2018, Demetrio et al., 2019).
Binary feature detectors, regardless of defensive architecture (outlier detectors, anomaly scores, input-convex networks), can suffer attack success rates exceeding 90% with fewer than 20 feature flips and virtually complete evasion under modest perturbation budgets when attacked by advanced methods such as sigma-binary (Jafari et al., 14 May 2025).
Adversarial robustness is often confined to the local ball around known attack types; expanding the attack method set (e.g., switching from gradient-based to genetic or RL-based perturbation) uncovers new blind spots (Ebrahimi et al., 2020, Rigaki et al., 2023).

Table: Attack Success Rates (ASR) of Key Defenses Under Sigma-Binary (Jafari et al., 14 May 2025)

Defense	ASR₁₀	ASR₂₀	ASR₅₀	ASR_∞
KDE	91.75	100	100	100
AT-rFGSM⁽ᵏ⁾	51.35	75	92.01	99.45
PAD-SMA	36.34	60	89.79	94.56

4. Defensive Mechanisms and Their Efficacy

Defensive strategies evaluated include:

Adversarial training: Incorporation of strong attacks in the training loop improves robustness to known attack patterns (e.g., lowering ASR₁₀ for PAD-SMA), but new attack types (e.g., sigma-binary) recover high evasion rates outside the training attack distribution (Jafari et al., 14 May 2025, Al-Dujaili et al., 2018, Podschwadt et al., 2019).
Detector ensembles: Multi-detector combinations (e.g., MalConv, Random Forest, ssdeep-LSH) with majority voting exhibit improved resistance; adversarial variants evading one model rarely transfer to others, and joints attacks are detected as anomalous “outliers” in transformation detector modules (Salman et al., 2024).
Certified defense directions: Proposals include randomized smoothing adapted to discrete Hamming balls or provable invariants in binary feature spaces; current instantiations remain uncommon (Jafari et al., 14 May 2025).
Volatile feature elimination: Pre-processing pipelines that remove or zero-out all easily manipulated header, padding, and gap bytes, and robust section-local representations (GAT-based), yield substantial resilience to binary manipulation (Abusnaina et al., 2023).

It is consistently observed that non-robust detectors depend heavily on volatile features (headers, global byte-histograms, easily inserted sections), while more robust systems focus on monotonic, section-local, or cross-view invariant features (Abusnaina et al., 2023, Hu et al., 2022). Multi-view learning (e.g., ARMD) combining binary and source-code representations confers up to a sevenfold increase in robustness under black-box evasion (Hu et al., 2022).

5. Evaluation, Transferability, and Empirical Findings

Adversarial malware binary attacks are characterized by:

Empirical evaluation on large datasets: Systematic evaluation frameworks now use malware corpora exceeding 20K samples, with balanced benign/malicious splits and hold-out test sets. Key metrics include attack success rate at constrained perturbation budgets, median Hamming distance, and detection performance (TPR/FPR) under attack (Jafari et al., 14 May 2025, Suciu et al., 2018).
Limited transferability: Unlike adversarial images, binaries crafted to evade one model (e.g., MalConv) rarely transfer to others (e.g., EMBER-RF, ssdeep) unless the perturbation magnitude becomes large enough to be easily detected as a generic outlier. Ensemble defenses thus substantially mitigate real-world attack efficacy (Salman et al., 2024).
Minimal perturbation strategies: Strong attacks can achieve successful evasion with as few as 4 feature flips on non-robust models, and 10 with advanced defenses (e.g., PAD-SMA) for 50% evasion (Jafari et al., 14 May 2025).

6. Open Problems and Research Directions

Current research identifies several open problems:

Black-box attack scaling: White-box methods relying on gradients (e.g., sigma-binary) are highly effective but impractical in settings where gradients are unavailable. Surrogate model training, RL-guided sample manipulation, and genetic algorithms are active areas for query-efficient, black-box adversarial malware generation (Rigaki et al., 2023, Ebrahimi et al., 2020).
Certified robustness: Provable certificates protecting against all discrete manipulations within Hamming balls are not yet standard; research into randomized smoothing and combinatorial abstraction for malware remains ongoing (Jafari et al., 14 May 2025).
Semantic and compositional constraints: Incorporating control-flow graph, manifest, or structural integrity checks into adversarial training is necessary for closing the gap between feature-space and real-world executable constraints (Jafari et al., 14 May 2025).
Section-level and multi-view detection: Per-section, monotonic, or graph-structured feature representations show promise against binary-level attacks, particularly for PE/ELF binaries where attackers otherwise manipulate volatile fields (Abusnaina et al., 2023). Multi-view (binary plus disassembly/source code) architectures further mitigate attack transfer (Hu et al., 2022).
Adaptive arms race: As new attacks (e.g., sigma-binary, RL-guided mutation) reveal blind spots in detectors and adversarial training methods, continued adversary-defender co-evolution is a central challenge.

7. Summary Table: Principal Techniques for Generating Adversarial Malware Binaries

Attack Family	Domain	Optimization Mode	Typical Perturbation	Reference
Sigma-binary, PBR	Binary feature	Gradients, white	Feature flips (4–20 bits)	(Jafari et al., 14 May 2025)
Gradient append, FGSM	Raw bytes	Gradients, white	EOF or slack byte updates	(Suciu et al., 2018)
Header gradient/IG	Raw bytes	Gradients, white	DOS header bytes (<100)	(Demetrio et al., 2019)
GAMMA (Genetic, padding/inj)	Raw bytes	Black-box, search	Section padding/injection	(Dasgupta et al., 2021)
MalRNN	Raw bytes	Seq2Seq, black	Overlay add (benign-like)	(Ebrahimi et al., 2020)
MEME	Raw bytes	RL/surrogate, black	Multi-step PE modifications	(Rigaki et al., 2023)
Instrumentation gadgets	Wasm, bytecode	Grey-box, substitute	Dead code, dead-data bytes	(Loose et al., 2023)

These collectively illustrate that adversarial malware binaries present a severe challenge to both conventional and state-of-the-art ML-based malware defenses, necessitating continued development of structurally aware training, robust feature engineering, and adaptive ensemble approaches. The ongoing arms race between adversarial binary generation and robust ML malware detection frameworks remains an active frontier (Jafari et al., 14 May 2025).