- The paper introduces CVE-Genie, a multi-agent framework leveraging LLMs to automate the reproduction of software vulnerabilities.
- The framework decomposes tasks into four modules—Processor, Builder, Exploiter, and CTF Verifier—to systematically create verifiable exploit datasets.
- The approach achieved a 428/841 success rate in CVE reproductions, highlighting its potential for enhancing vulnerability detection and security assessments.
From CVE Entries to Verifiable Exploits: An Automated Multi-Agent Framework for Reproducing CVEs
The paper introduces "CVE-Genie", a framework utilizing LLMs in a multi-agent context to automate the reproduction of software vulnerabilities for precise vulnerability datasets creation. This essay presents an in-depth explanation of its architecture, implementation, and potential real-world applications.
CVE-Genie Overview and Architecture
CVE-Genie is designed to reproduce vulnerabilities detailed in CVE entries by utilizing a multi-agent framework that integrates LLMs with various automated stages. It comprises four core components: Processor, Builder, Exploiter, and CTF Verifier, each orchestrating a specific step in the end-to-end reproduction pipeline. The architecture ensures efficient handling of vast CVE data to produce actionable and verifiable exploits.
Figure 1: CVE-Genie Overview.
The framework leverages LLMs for its nuanced capability in SWE tasks. CVE-Genie’s architecture adheres to key principles, including modular task decomposition, robustness against incomplete data, and reliability through self-critique, enabling comprehensive reproduction from sparse CVE data.
Detailed Module Functions
Processor
The Processor module extracts raw data from CVE entries, including source code and security advisories, to create a structured knowledge base. This involves:
- Data Processor: Collecting vulnerable versions of software and specific configurations, highlighting source code from public repositories.
- Knowledge Builder: Transforming gathered data into a usable format for subsequent modules, ensuring essential CVE details are retained for exploit reproduction.
Builder
This module reconstructs the vulnerable environment using data from the Processor. It involves:
- Pre-Requisite Developer Agent: Analyzes project requirements and plans environment setup.
- Setup Developer and Critic Agents: Execute setup commands and verify configurations, ensuring the vulnerable environment is operational.
Exploiter
The Exploiter module generates and tests exploits within this configured environment:
- Exploit Developer Agent: Uses structured data to recreate or generate exploits.
- Exploit Critic Agent: Evaluates and critiques exploit attempts to ensure fidelity and effectiveness against CVE descriptions.
CTF Verifier
Finally, the CTF Verifier ensures the produced exploits reliably reproduce vulnerabilities:
Implementation Considerations
CVE-Genie's implementation considers the following:
Computational Requirements and Trade-offs
- Resource Efficiency: Highly context-dependent agents help manage extensive requirements, enhancing task-specific LLM adaptation.
- Error Handling: Robust feedback mechanisms allow iterative improvements, essential for handling complex open-source environments and incomplete advisories.
CVE-Genie successfully reproduced 428 out of 841 CVEs, across diverse programming languages and projects, demonstrating significant efficiency. Performance metrics revealed that web vulnerabilities were more often successfully reproduced compared to memory-safety issues in system dependencies.
Scaling and Future Improvements
Scaling considerations include enhancing UI interaction for CVEs involving web interfaces and integrating multimodal data processing. Further research will address critics' over-stringency and explore cost optimization for broader usability.
Application in AI and Security
CVE-Genie offers immense potential across various applications:
- Vulnerability Detection: Provides datasets for training and benchmark testing ML models.
- Software Security Evaluation: Facilitates rigorous testing of patching efforts and secure code generation capabilities.
- AI-assisted Development: Enhances penetration testing and attack detection through feasible recreation of complex exploit chains.
Conclusion
CVE-Genie represents a significant advancement in automated CVE reproduction, leveraging LLMs in a multi-agent framework to rapidly create high-quality, reproducible vulnerability datasets. It addresses data scarcity issues, significantly aiding in automated vulnerability assessment and prediction tools research. Future adaptations will explore integrating broader context detection and multimodal capabilities to enhance further its reproduction scope and reliability.