StriderSPD: Binary Security Patch Detection
- StriderSPD is a framework that fuses control-flow graph embeddings and pseudo-code semantics to accurately detect security patches in closed-source binaries.
- It employs a novel two-stage training regime, first fine-tuning an LLM with LoRA adapters and then optimizing graph and cross-attention modules for stable fusion.
- Experimental results show state-of-the-art performance with 85.4% accuracy, significant F1 improvements, and notably lower false-positive rates compared to prior methods.
Security Patch Detection (SPD) facilitates the identification of security-relevant changes in software, a critical task in environments where vendors silently patch binaries without source-level disclosure. The Structure-Guided Binary SPD (StriderSPD) framework addresses the unique challenges of detecting security patches within closed-source binaries—where traditional methods that depend on source code or isolated assembly or pseudo-code abstractions are inherently limited—by fusing structured control-flow information with high-level semantic cues via joint deep representation learning (Li et al., 9 Jan 2026).
1. Problem Domain and Challenges
Binary SPD in closed-source settings is defined by the requirement to classify changes between pairs of stripped binaries—pre- and post-patch—for vulnerability-fixing content in the absence of source code artifacts. Compilers obscure source-level semantics and structure, while real-world patches may combine security fixes and unrelated modifications such as refactorings. Disassembly to assembly exposes control-flow structure (e.g., basic blocks, address jumps) but omits semantic information, whereas decompilation restores some high-level context but lacks a parser-compatible, explicit program structure. Neither view alone suffices for robust, accurate determination of security-fix intent, especially when security changes are subtle or interleaved with noise.
2. StriderSPD Architecture
StriderSPD integrates structural and semantic representations through a two-branch neural architecture, with fusion occurring inside the attention layers of a LLM.
- Graph Branch: Receives functions lifted to assembly, constructs control-flow graphs (CFGs) with nodes as basic blocks and edges as control transfers, and produces graph embeddings via a Gated Graph Convolutional Network (GGCN).
- LLM Branch: Processes the same function pair after decompilation to pseudo-code, formats inputs as an instruction template (e.g., “Given these two C-like functions, do they fix a vulnerability? Answer yes/no.”), and encodes them with a code-oriented LLM (such as Qwen3-8B) into token embeddings.
- Integration: Structure is injected at the token level within each LLM self-attention layer through dedicated adapters (Query, Key, Value adapters), a gating network, and cross-attention. This design enables the LLM to attend to both pseudo-code token sequences and structurally salient cues (notably, those reflecting modifications like guard conditions or early exits).
3. Graph-LLM Fusion: Adapter and Gating Mechanisms
Adapter modules serve to align the graph-derived embedding with the LLM’s internal representation spaces:
- Processing Pipeline:
- The GGCN’s pooled embedding is gated: .
- Three independent feed-forward networks (FFNs) yield , , for Query, Key, and Value roles, modulated by the gate.
- Fusion is executed as follows:
where “” denotes broadcasting, "+" is element-wise addition, and "∥" indicates concatenation across tokens.
- Semantics: This approach maintains one-to-one alignment for Queries and appends structure-enriched tokens to Keys and Values, enabling attention heads to condition on both semantic and structural signals when forming latent representations.
4. Two-Stage Training Regime
Owing to a 100× parameter gap between the LLM and the lightweight graph branch, naive co-training is suboptimal due to gradient interference and instability. StriderSPD employs a two-stage optimization:
- Stage 1: Fine-tune only the LLM (using LoRA adapters) on pseudo-code instructions, with cross-entropy loss on generated “yes/no” tokens:
Here, are LoRA parameters and is the instruction with corresponding pseudo-code.
- Stage 2: Freeze the LLM; train the graph, adapter, and cross-attention parameters end-to-end with binary cross-entropy, encouraging the graph branch to inject structure that enhances detection:
with and the sigmoid.
This regimen ensures semantic anchors are established before structure-guided adaptation, stabilizing multi-branch optimization.
5. Binary SPD Benchmark: Realistic, Disjoint Evaluation
Existing SPD datasets often suffer from project/domain overlap between train and test splits, misrepresenting closed-source settings. StriderSPD’s evaluation corpus is constructed as follows:
- Datasets from prior work (Linux, FFmpeg, Git, PHP, Libav) are excluded.
- Five domains (ImageMagick, TcpDump, QEMU, Radare2, Slurm), none present in SPD corpora, provide test projects.
- Manual inspection of 1,068 source files yields 1,720 binary function pairs (1,010 security-fixes, 710 non-security), with disjointness at both the project and domain levels.
- Patches are compiled at multiple optimization levels (O0, O1, O2, O3, Os), followed by decompilation for assembly (CFG extraction) and pseudo-code.
This benchmark enforces zero train-test overlap, presenting a rigorous closed-source detection scenario.
6. Experimental Results and Ablations
StriderSPD achieves state-of-the-art performance on the cross-project, cross-domain benchmark:
| Method/Setting | Accuracy | F1 | False-Positive Rate |
|---|---|---|---|
| StriderSPD | 0.854 | 0.885 | 0.293 |
| Yi-Coder-9B-Chat (best prior) | 0.758 | — | 0.477 |
- The framework displays consistent improvements in accuracy and F1 across all compiler optimization levels, counteracting increased CFG and pseudo-code variability with higher O1–O3 settings.
- Generalizability is demonstrated through accuracy gains with ten different open-source LLM backbones (e.g., +32.8% on Qwen3-8B, +22.7% on DeepSeek-7B).
- Ablation studies confirm the necessity and synergy of both branches, as well as the critical role of adapters, gating, cross-attention, and the two-stage training regime. Omitting the graph branch, LLM branch, adaptive fusion, adapters, gate, cross-attention, or two-stage regime reduces accuracy by 17.8–30% and severely increases false positives (e.g., FPR nearly doubling in the absence of adaptive fusion).
A plausible implication is that structure-guided cross-modal fusion at the token level is essential for robust SPD in settings deprived of source-level artifacts.
7. Significance and Outlook
StriderSPD is the first framework to unify assembly-level control-flow structure with pseudo-code semantics at the granularity of individual LLM tokens for binary SPD. It establishes a methodologically rigorous and empirically validated standard for closed-source security patch detection with a benchmark constructed for maximal domain disjointness. Its advances in joint graph–LLM fusion, adapter-based token-level alignment, and stable two-stage multi-branch training address key open problems in learning from lossy, ambiguous binary artifacts. These architectural and experimental findings provide a foundation for subsequent work in security patch analysis and representation learning for program binaries (Li et al., 9 Jan 2026).