Papers
Topics
Authors
Recent
Search
2000 character limit reached

UniASM: Binary Code Similarity Detection without Fine-tuning

Published 28 Oct 2022 in cs.CR, cs.LG, and cs.SE | (2211.01144v4)

Abstract: Binary code similarity detection (BCSD) is widely used in various binary analysis tasks such as vulnerability search, malware detection, clone detection, and patch analysis. Recent studies have shown that the learning-based binary code embedding models perform better than the traditional feature-based approaches. However, previous studies have not delved deeply into the key factors that affect model performance. In this paper, we design extensive ablation studies to explore these influencing factors. The experimental results have provided us with many new insights. We have made innovations in both code representation and model selection: we propose a novel rich-semantic function representation technique to ensure the model captures the intricate nuances of binary code, and we introduce the first UniLM-based binary code embedding model, named UniASM, which includes two newly designed training tasks to learn representations of binary functions. The experimental results show that UniASM outperforms the state-of-the-art (SOTA) approaches on the evaluation datasets. The average scores of Recall@1 on cross-compilers, cross-optimization-levels, and cross-obfuscations have improved by 12.7%, 8.5%, and 22.3%, respectively, compared to the best of the baseline methods. Besides, in the real-world task of known vulnerability search, UniASM outperforms all the current baselines.

Citations (8)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (4)

Collections

Sign up for free to add this paper to one or more collections.