HObfNET: Hierarchical Obfuscation Score Surrogate
- HObfNET is a hierarchical neural network surrogate that estimates smart contract obfuscation scores with near parity to ObfProbe (PCC 0.9158, MAPE 8.20%).
- It leverages a four-stage Hierarchical Attention Network combining local and global Transformer encoding to achieve orders-of-magnitude speedup over traditional static analysis.
- The system generalizes across Ethereum, BSC, Polygon, and Avalanche using chain-specific thresholds for accurate audit queueing and incident-response workflows.
HObfNET is an efficient hierarchical neural network designed as a surrogate for ObfProbe, the leading static-analysis tool for quantifying smart-contract obfuscation. It enables scalable, bytecode-driven inference of obfuscation scores across Ethereum, BSC, Polygon, and Avalanche ecosystems, serving operational security teams by powering automated cross-chain audit queueing and incident-response workflows. HObfNET achieves near parity with ObfProbe’s Z-score outputs (PCC 0.9158, MAPE 8.20%) while realizing orders-of-magnitude efficiency gains.
1. Architectural Overview and Surrogate Function
HObfNET targets regression of the canonical obfuscation score (Z-score per ObfProbe) from raw smart-contract bytecode. The architecture employs a four-stage Hierarchical Attention Network (HAN):
- Normalization and Segmentation: Bytecode inputs are canonicalized (removal of “0x” prefix, compiler metadata, constructor code) and partitioned into fixed-length segments of bytes.
- Local Encoding: Each segment passes through a 2-layer Transformer encoder (dimension , 4 heads, dropout $0.1$) with bytewise embedding and local positional encoding.
- Global Encoding: Encoded segments are further processed by a global 2-layer Transformer with chunk-level positional embeddings and bidirectional attention to capture cross-segment, contract-level structure.
- Multi-task Output Head: The pooled contract representation , segment-wise reconstructed feature vector , and auxiliary Z-score are fused via an MLP to yield the final obfuscation score .
This surrogate eliminates the reliance on computation-intensive static single-assignment analysis; at training time, labels exist only for Ethereum, but the model generalizes inference to other chains without auxiliary labels (Zhao et al., 24 Jan 2026).
2. Input Feature Processing and Data Management
Bytecode preprocessing enforces canonicalization and mapping each byte to integer tokens ($1$–$255$; pad $0$). The vocabulary size is $257$. Segmentation ensures uniform-length chunks, with a validity mask tracking real-data boundaries.
Key data splits are enforced at the “bytecode-family” cluster level to prevent label leakage via near-duplicates, using a $7:2:1$ train/validation/test split across 1.04M Ethereum contracts. Cross-chain datasets (BSC, Polygon, Avalanche) lack ObfProbe supervision and are scored by HObfNET only at inference stage.
3. Network Architecture, Optimization, and Loss Formulations
Each segment’s local encoding is
and the pooled contract representation employs masked mean pooling:
Multi-task reconstruction applies
The total loss is
with .
Optimization employs AdamW (learning rate , weight decay , batch size $24$, $20$ epochs, gradient clipping $0.5$) on A100-class GPUs.
4. Performance Evaluation and Ablation Analysis
On a held-out test set of 104,000 Ethereum contracts, HObfNET achieves:
Ablation results underscore the value of the hierarchical and multi-task design. The following table compares architecture variants:
| Model | MAPE % | MAE | MSE | PCC |
|---|---|---|---|---|
| Standard Transformer | 16.29 | 0.9521 | 2.7147 | 0.8466 |
| HAN w/ GRU | 14.28 | 0.8794 | 2.4511 | 0.8484 |
| HAN w/o multi-task | 13.02 | 0.8359 | 2.3371 | 0.8619 |
| HAN (full HObfNET) | 8.20 | 0.6341 | 1.4477 | 0.9158 |
HObfNET’s scoring throughput is $8$–$9$ ms/contract (batch size $200$; $8.67$ ms for BSC). Compared to ObfProbe (median $41$ s, mean $19.66$ s per contract, single-threaded), the model enables – speedup, allowing million-scale audit pre-filtering.
5. Threshold Methodology and Chain-Specific Tail Analysis
Audit candidate selection is thresholded at chain-specific percentiles. For chain and score distribution , percentile cutoffs are defined as:
Observed p99 and p99.9 thresholds (rounded):
| Chain | p99 | p99.9 |
|---|---|---|
| Ethereum | 18.07 | 22.69 |
| BSC | 16.82 | 19.74 |
| Polygon | 18.72 | 20.51 |
| Avalanche | 19.18 | 20.67 |
Direct transfer of Ethereum thresholds to other chains produces queue inflation or deflation due to score drift (e.g., BSC: only above ETH p99; Avalanche: ). This motivates chain-specific stratified queues.
6. Audit Queueing, Structural Triage, and Cross-Chain Linkage
HObfNET enables a two-tier audit queue for each chain:
- Main Queue:
- Emergency Queue:
Further triage within queues leverages structural cues:
- Low signature density (signatures per KB)
- Enriched external-call opcodes: e.g., DUP8–DUP11, STATICCALL, RETURNDATASIZE/COPY, GAS
- Rare selectors: selector lift $10$– over baseline
- Proxy indicator enrichment: BSC main queue proxy fraction (background )
Cross-chain linkage is enacted once high-score clusters (identical bytecode hashes) are flagged, triggering immediate lookup across chains. Tail Jaccard $1.5$– that of overall contract reuse; directional diffusion favors small-to-large chain propagation.
For incident samples, all publicly alignable cases fall within the p99 queue. For example:
- Transit Swap DEX Hack (2022-10-02): p99.74
- New Free DAO Flash Loan (2022-09-08): p99.21 Both are in the main queue, not the extreme (p99.9) queue.
7. Mathematical Formalisms and Metrics
Canonical formulas underpin HObfNET’s operation:
- ObfProbe Z-score:
- Masked mean pooling:
- Multi-output fusion:
- Composite loss function:
- MAPE (mean absolute percentage error):
- PCC (Pearson correlation coefficient):
8. Practical and Research Implications
HObfNET establishes a tractable pipeline for operationalizing obfuscation signals at scale, supporting multi-chain security audits, queueing, and forensic linkage. The efficiency and cross-chain generalization suggest actionable prioritization for security incident response. The approach reveals protocol-level score drift and tail characteristics (opcode enrichments, rare selectors) relevant for both automated and manual asset triage. All results and methodologies derive from (Zhao et al., 24 Jan 2026).