Translating machine code to human-readable high-level languages

Establish effective techniques for translating machine code into human-readable high-level programming languages in the context of reverse engineering.

Background

Classical decompilers often yield functionally correct but hard-to-read outputs lacking meaningful identifiers and idioms, motivating research into more human-readable decompilation. Recent machine learning approaches have improved decompilation to C, but translating low-level machine code into high-level, idiomatic code across languages remains a fundamental challenge.

This paper investigates small, specialized LLMs for decompilation into modern languages (Dart/Swift), evaluating readability with CodeBLEU and syntax validity with compile@k. While the study presents promising results for Dart, the overarching challenge of reliably translating machine code to human-readable high-level code is identified as an open research problem.

References

Translating machine code into human-readable high-level languages is an open research problem in reverse engineering.

LLMs as Idiomatic Decompilers: Recovering High-Level Code from x86-64 Assembly for Dart  (2604.02278 - Abualazm et al., 2 Apr 2026) in Abstract (p. 1)