BugScope: Learn to Find Bugs Like Human

Published 21 Jul 2025 in cs.SE | (2507.15671v1)

Abstract: Detecting software bugs remains a fundamental challenge due to the extensive diversity of real-world defects. Traditional static analysis tools often rely on symbolic workflows, which restrict their coverage and hinder adaptability to customized bugs with diverse anti-patterns. While recent advances incorporate LLMs to enhance bug detection, these methods continue to struggle with sophisticated bugs and typically operate within limited analysis contexts. To address these challenges, we propose BugScope, an LLM-driven multi-agent system that emulates how human auditors learn new bug patterns from representative examples and apply that knowledge during code auditing. Given a set of examples illustrating both buggy and non-buggy behaviors, BugScope synthesizes a retrieval strategy to extract relevant detection contexts via program slicing and then constructs a tailored detection prompt to guide accurate reasoning by the LLM. Our evaluation on a curated dataset of 40 real-world bugs drawn from 21 widely-used open-source projects demonstrates that BugScope achieves 87.04% precision and 90.00% recall, surpassing state-of-the-art industrial tools by 0.44 in F1 score. Further testing on large-scale open-source systems, including the Linux kernel, uncovered 141 previously unknown bugs, of which 78 have been fixed and 7 confirmed by developers, highlighting BugScope's substantial practical impact.

Abstract PDF Upgrade to Chat

Summary

The paper presents a novel human-inspired approach that synthesizes retrieval strategies and LLM-driven prompts to mimic expert bug detection.
The methodology employs dual agents for context retrieval and bug detection, innovatively learning bug patterns from labeled examples.
Evaluations show BugScope outperforms traditional tools with superior precision and recall, effectively identifying diverse and system-specific bugs.

Summary of "BugScope: Learn to Find Bugs Like Human"

Introduction to Bug Detection Challenges

The detection of software bugs is a persistent challenge in the field of software engineering, exacerbated by the extensive diversity of real-world bugs. Traditional static analysis methods have relied heavily on symbolic workflows, which unfortunately limit their scope and adaptability to the wide range of possible bugs characterized by diverse anti-patterns. Although recent advancements have harnessed LLMs for bug detection, these approaches struggle to effectively handle sophisticated bugs and often operate within limited analytical contexts.

BugScope is a proposed solution intending to emulate human auditors in learning new bug patterns from examples and applying this learned knowledge during code audits. Given examples of buggy and non-buggy behaviors, BugScope synthesizes a retrieval strategy to gather context through techniques like slicing, followed by generating detection prompts to facilitate reasoning by LLMs. In evaluations, BugScope demonstrated superior performance to existing industrial tools by achieving high precision and recall rates.

Figure 1: The examples of anti-patterns causing various types of bugs.

Motivations Behind Bug Detection

Software bugs pose significant threats to system security, resulting in critical failures such as memory exhaustion and system crashes. The diversity of software weaknesses is categorized extensively in industry standards like CWE, with hundreds of types that complicate detection efforts. These bugs often arise in varied semantic contexts within classes, demonstrating diverse anti-patterns that challenge detection methods. Additionally, system-specific bugs, like those in the Linux kernel, add another layer of complexity, necessitating more generalized detection approaches across varied real-world scenarios.

Existing tools like Meta Infer and CodeQL, operating on symbolic rules, lack the flexibility for novel or system-specific anti-patterns, limiting broader applicability. LLM-driven solutions have showcased strong capabilities in semantic reasoning but still face limitations in uncommon anti-patterns due to restricted analytical contexts. The emergence of neuro-symbolic approaches provides partial mitigation; however, these have yet to achieve flexibility in tackling diverse anti-patterns effectively.

BugScope's Approach

BugScope mirrors human code auditing processes by utilizing two collaborative agents: context retrieval and bug detection agents. These agents synthesize analysis logic from labeled examples of code involving specific anti-patterns, automating the auditing process. BugScope offers highly customizable bug detection across varied anti-patterns by leveraging LLM capabilities in code reasoning, replicating expert workflows often utilized by human auditors.

Figure 2: The overview of BugScope.

Evaluation and Results

BugScope was evaluated using its curated dataset containing real-world bugs from several open-source projects, achieving notable precision and recall rates surpassing those of established tools. Furthermore, additional deployments on large-scale open-source projects uncovered numerous previously unknown bugs, many of which have been fixed or confirmed by developers, demonstrating its practical impact.

Conclusion

BugScope promises significant advancements in software reliability and security by mimicking human approaches to bug detection. The system's ability to generalize detection strategies across diverse anti-patterns without manual rule crafting highlights its strong potential for broader applicability in real-world settings. As bug detection remains a critical challenge, solutions like BugScope pave the way for future studies on leveraging AI to enhance automated code auditing processes.