EnIGMA: Interactive Tools Substantially Assist LM Agents in Finding Security Vulnerabilities

Published 24 Sep 2024 in cs.AI | (2409.16165v3)

Abstract: Although LLM (LM) agents have demonstrated increased performance in multiple domains, including coding and web-browsing, their success in cybersecurity has been limited. We present EnIGMA, an LM agent for autonomously solving Capture The Flag (CTF) challenges. We introduce new tools and interfaces to improve the agent's ability to find and exploit security vulnerabilities, focusing on interactive terminal programs. These novel Interactive Agent Tools enable LM agents, for the first time, to run interactive utilities, such as a debugger and a server connection tool, which are essential for solving these challenges. Empirical analysis on 390 CTF challenges across four benchmarks demonstrate that these new tools and interfaces substantially improve our agent's performance, achieving state-of-the-art results on NYU CTF, Intercode-CTF, and CyBench. Finally, we analyze data leakage, developing new methods to quantify it and identifying a new phenomenon we term soliloquizing, where the model self-generates hallucinated observations without interacting with the environment. Our code and development dataset are available at https://github.com/SWE-agent/SWE-agent/tree/v0.7 and https://github.com/NYU-LLM-CTF/NYU_CTF_Bench/tree/main/development respectively.

Abstract PDF Upgrade to Chat

Citations (4)

View on Semantic Scholar

Summary

The paper introduces EnIGMA, which leverages novel interactive command-line interfaces to autonomously solve cybersecurity Capture The Flag challenges.
It employs an Interactive Agent Tool that dynamically integrates debuggers and server connection utilities for effective reverse engineering and remote analysis.
Evaluations on over 350 CTF challenges show that EnIGMA significantly outperforms prior agents, setting a new state-of-the-art in cybersecurity tasks.

EnIGMA: Enhanced Interactive Generative Model Agent for CTF Challenges

The research paper introduces EnIGMA, an Enhanced Interactive Generative Model Agent specifically designed to autonomously solve Capture The Flag (CTF) challenges. Unlike previous LLM (LM) agents, which have shown limited success in cybersecurity due to simplistic designs and inadequate features, EnIGMA is built with new Agent-Computer Interfaces (ACIs) tailored to the cybersecurity domain. EnIGMA represents a significant advancement in the application of LMs to the cybersecurity domain, offering a range of specialized tools to address the nuances and complexities of cybersecurity tasks.

Overview and Methodology

The primary contribution of the paper is the introduction of the Interactive Agent Tool (IAT), extending the ACI concept presented in the SWE-agent framework. IATs enable LM agents to utilize interactive command-line utilities such as debuggers and server connection tools, which are essential for CTF challenges. These challenges often require interactive engagement with debugging tools and communication with remote servers, and IATs provide a solution for these requirements.

EnIGMA is built with robust interfaces for two main interactive tools:

EnIGMA Debugger: This interface incorporates commands for starting a gdb session, adding breakpoints, stepping through instructions, continuing execution, and executing arbitrary gdb commands. These capabilities are crucial for reverse engineering and dynamic program analysis tasks.
EnIGMA Server Connection Tool: Utilizing the pwntools library, this tool facilitates connection to remote servers, allowing the agent to send and receive data interactively. It addresses the need for interaction with web exploitation or binary exploitation challenges often involving remote servers.

The authors conducted extensive evaluations of EnIGMA using a comprehensive set of over 350 CTF challenges derived from the NYU CTF, Intercode-CTF, and HackTheBox benchmarks. Results demonstrate that EnIGMA achieves state-of-the-art performance on these benchmarks, significantly outperforming existing agents, particularly on the NYU CTF and Intercode-CTF benchmarks.

Strong Numerical Results and Implications

EnIGMA's empirical analysis focuses on understanding which features are most beneficial to solving CTF challenges. Key results include:

EnIGMA solved more than three times as many challenges as prior agents on the NYU CTF benchmark, achieving up to 13.5% success on this benchmark using Claude 3.5 Sonnet.
The implementation of the LM summarization technique and the use of in-context learning through demonstrations resulted in improved handling of long context inputs and enhanced problem-solving capabilities.

These results highlight the effectiveness of the new IATs and ACI-driven interfaces in enhancing LM agent performance in cybersecurity. The results also emphasize the importance of using demonstrations and learning from successful problem-solving techniques to guide agents in similar challenges.

Future Implications in AI and Cybersecurity

The development of EnIGMA opens several avenues for future research. It suggests potential extensions for real-time cybersecurity applications, where LMs can be utilized not only for CTF challenges but also to automate intrusion detection and vulnerability management. Furthermore, the approach pioneered by EnIGMA could be adapted to automate other cybersecurity tasks that require a combination of dynamic and static program analysis.

The incorporation of well-designed interfaces tailored to LM agents' needs shows great potential beyond cybersecurity, reflecting a broader implication for LM's application in other specialized domains. The authors acknowledge potential solutions for challenges such as data leakage and soliloquizing, which arises from model exposure to training data during development. Addressing these challenges could further improve the accuracy and reliability of LM agents across various applications.

Overall, the paper presents a thoughtful and detailed contribution to the field of AI-driven cybersecurity tools, providing valuable insights for further research and development of LM agents capable of addressing real-world cybersecurity problems. As model architectures evolve and agents become more sophisticated, EnIGMA sets a solid precedent for future advancements in LM-driven cybersecurity solutions.