LTL Model Checking for Self-Modifying Code
- The paper introduces a self-modifying pushdown system (SM-PDS) that extends classical PDS to capture dynamic code mutations for LTL model checking.
- It employs a symbolic saturation method and reduces the LTL checking problem to the emptiness problem for self-modifying Büchi pushdown systems (SM-BPDS) with proven EXPTIME-completeness.
- Experimental evaluations demonstrate that the tool achieves 100% detection rate in malware benchmarks while outperforming traditional PDS-based analyzers in speed and resource usage.
Self-modifying code refers to programs capable of altering their own instructions dynamically during execution, a feature extensively leveraged in malware to obfuscate behaviors and evade analysis. The verification challenge posed by the evolving nature of self-modifying code necessitates formal models capable of capturing both stack-based control flow and dynamic program mutation. One approach, as formalized in (Touili et al., 2019), extends pushdown systems (PDS) to the class of self-modifying pushdown systems (SM-PDS) and addresses the problem of Linear Temporal Logic (LTL) model checking for such systems. The developed framework further establishes a reduction to the emptiness problem for self-modifying Büchi pushdown systems (SM-BPDS), with algorithms and tool support validated on self-modifying malware benchmarks.
1. Formal Model: Self-Modifying Pushdown Systems
A Self-Modifying Pushdown System (SM-PDS) is defined as a quadruple where is a finite set of control-locations, is a finite stack alphabet, denotes standard pushdown rules, and encodes self-modifying rules.
The semantics operate over configurations , where , , and is the current active phase (set of rules). Execution proceeds via two step types:
- Standard rule: for .
- Self-modification: if with and .
Setting yields a classical PDS.
2. Büchi Acceptance and Emptiness for SM-BPDS
The notion of Büchi acceptance is incorporated by extending SM-PDS to SM-BPDS: , with specifying Büchi-accepting control-locations. Runs are infinite sequences of configurations, and acceptance is defined by infinitely recurring visits to locations in .
Head and Repetition: A head is a tuple . A head is repeating if, for some stack-suffix , traversing at least one configuration with control-location in .
The emptiness problem for SM-BPDS amounts to deciding whether a repeating head is reachable from a given initial configuration.
3. Reduction of LTL Model Checking to Emptiness
For an SM-PDS and labeling function (where is a set of atomic propositions), a configuration satisfies an LTL formula if some run projected through yields a model of . Following the automata-theoretic approach, a nondeterministic Büchi automaton is constructed for . The product system, denoted , has states and transitions formed on-the-fly from SM-PDS and , including correct treatment of rules.
Correctness (Theorem 4): A configuration of satisfies if and only if in the product SM-BPDS, the corresponding initial configuration admits an infinite accepting run. Thus, LTL model-checking for SM-PDS reduces in polynomial time to the SM-BPDS emptiness problem (Touili et al., 2019).
4. Algorithmic Approach: The Saturation Method and Complexity
The emptiness problem is addressed via a finite head-reachability graph , where vertices are possible heads and edge labels indicate whether is visited during the path. Cycles with at least one “1”-labeled edge correspond to repeating heads.
Computation: A symbolic saturation (fixed-point) algorithm builds automata recognizing predecessor heads, annotated with Büchi bits. Transitions—either from standard rules or self-modifications—are added until a global fixpoint is reached, operating polynomially in and but exponentially in .
Complexity Bound: The overall time for deciding SM-PDS LTL model checking is , placing the problem in EXPTIME. The result is EXPTIME-complete.
5. Implementation and Practical Procedures
A prototype tool implements the SM-PDS LTL model checking approach, with the following architecture:
- Disassembly and abstraction: Binaries are disassembled (via Jakstab), control-flow recovered, and a conservative approximation of indirect jump targets is computed to build the SM-PDS instance.
- Automata construction: Büchi automata for LTL properties are generated using LTL2BA.
- Symbolic product formation: Product SM-BPDS is built on the fly to avoid materializing exponential numbers of phases.
- Saturation procedure: A fixpoint computation over symbolic automata determines head reachability, with stack and phase transitions stored in adjacency lists.
- Cycle detection: 1-labeled cycles in the head-reachability graph are sought via Tarjan’s SCC algorithm augmented to detect Büchi visits.
The incremental symbolic nature of the algorithm avoids explicit enumeration of all phases, yielding practical efficiency superior to worst-case complexity in typical scenarios (Touili et al., 2019).
6. Experimental Evaluation: Malware Detection and Benchmarking
The tool was benchmarked in three principal settings:
| Setting | Samples/Remarks | Outcome |
|---|---|---|
| SM-PDS LTL checker vs. PDS+Moped | Synthetic PDSs () | SM-PDS tool runs in second, PDS+Moped needs minutes/hours; Moped often exhausts memory. |
| Self-modifying malware detection | 892 binaries (VirusShare, MalShare, VX-Heaven, NGVCK, benign XP) | 100% detection of matching malware, 0 false positives (benign marked safe); runtime min |
| Comparison with commercial antivirus | 205 fresh NGVCK self-modifying worms | No commercial AV detects all 205; tool detects 100% [Table IV, (Touili et al., 2019)] |
Included LTL properties formalize patterns such as registry-key injection, data-stealing, spy-worm activity, and appending virus behavior. Modeling self-modification in the SM-PDS semantics allows correct reachability analysis, which classical PDS-based (or static CFG) checkers cannot achieve when code mutation is present.
7. Illustrative Example: Encoding and Verification Workflow
As an explicit illustration, a toy self-modifying binary is mapped as follows. Addresses represent control-locations; for example, a mov [0x2], 0x0c instruction replaces the rule for push 0x9 at address $0x2$ with that for jmp 0x9\Delta_cp_3\#(r_{\mathsf{push\ 0x9}}, r_{\mathsf{jmp\ 0x9}})p_4$. Stack symbols correspond to return addresses. The resulting SM-PDS, leveraging this explicit mutation semantics, allows the LTL checker to detect paths—such as to aCopyFileA` call introduced dynamically—that would otherwise be missed if self-modifying effects were ignored. This capacity to analyze dynamic behavioral patterns is critical for sound detection in security-centric applications (Touili et al., 2019).