ChatDBG: Augmenting Debugging with Large Language Models

Published 25 Mar 2024 in cs.SE, cs.AI, cs.LG, and cs.PL | (2403.16354v5)

Abstract: Debugging is a critical but challenging task for programmers. This paper proposes ChatDBG, an AI-powered debugging assistant. ChatDBG integrates LLMs to significantly enhance the capabilities and user-friendliness of conventional debuggers. ChatDBG lets programmers engage in a collaborative dialogue with the debugger, allowing them to pose complex questions about program state, perform root cause analysis for crashes or assertion failures, and explore open-ended queries like "why is x null?". To handle these queries, ChatDBG grants the LLM autonomy to "take the wheel": it can act as an independent agent capable of querying and controlling the debugger to navigate through stacks and inspect program state. It then reports its findings and yields back control to the programmer. By leveraging the real-world knowledge embedded in LLMs, ChatDBG can diagnose issues identifiable only through the use of domain-specific reasoning. Our ChatDBG prototype integrates with standard debuggers including LLDB and GDB for native code and Pdb for Python. Our evaluation across a diverse set of code, including C/C++ code with known bugs and a suite of Python code including standalone scripts and Jupyter notebooks, demonstrates that ChatDBG can successfully analyze root causes, explain bugs, and generate accurate fixes for a wide range of real-world errors. For the Python programs, a single query led to an actionable bug fix 67% of the time; one additional follow-up query increased the success rate to 85%. ChatDBG has seen rapid uptake; it has already been downloaded more than 75,000 times.

Abstract PDF HTML Upgrade to Chat

Citations (2)

View on Semantic Scholar

Summary

The paper demonstrates ChatDBG's integration of GPT-4 with established debuggers, achieving up to a 91% success rate in identifying code defects.
It leverages enriched stack traces and interactive dialogue to perform root cause analysis and generate effective solutions.
Experiments on Python and C/C++ programs validate ChatDBG's potential to transform conventional debugging into an automated, efficient process.

Summary of "ChatDBG: Augmenting Debugging with LLMs"

The paper "ChatDBG: Augmenting Debugging with LLMs" introduces ChatDBG as an AI-powered debugging assistant that combines LLMs with standard debugging tools to enhance their capabilities. ChatDBG allows programmers to engage in collaborative dialogues with debuggers to perform complex debugging tasks such as root cause analysis and exploring open-ended queries.

ChatDBG Capabilities and Implementation

The core strength of ChatDBG lies in its integration with existing debuggers, such as LLDB, GDB, WinDBG, and Pdb. It provides a unified interface for programmers to interact with the debugger by asking high-level questions in natural language. For instance, queries may include "why is x null?" or "why isn't this value what I expected?" ChatDBG grants autonomy to the LLM to navigate through stacks, inspect program state, issue debugging commands, and communicate its findings back to the programmer.

The implementation leverages OpenAI's API and GPT-4 models to process textual prompts representing debugging queries. ChatDBG uses enriched stack traces, which include additional context about code and variables, enhancing the LLM's understanding of the program state and execution. This enrichment allows for more accurate diagnosis and solutions to complex bugs.

Figure 1: ChatDBG architecture and top-level command processing loop illustrating communication between user, debugger, and LLM.

Evaluation and Effectiveness

The paper evaluates ChatDBG through two sets of experiments: one with Python programs written by students and another with unmanaged C/C++ programs. The evaluation focuses on ChatDBG's ability to identify the root cause of defects and generate fixes.

Python Experiments:
- Conducted across non-interactive scripts and interactive Jupyter notebooks.
- Demonstrated a success rate of 85% when engaging in dialogues, showcasing effective diagnostic capabilities with minimal input.
- Highlighted that enriched stack traces and allowing the LLM to "take the wheel" were crucial for improving success rates.
C/C++ Experiments:
- Involved debugging well-known bugs from BugBench and BugsC++ suites.
- ChatDBG was able to identify the root cause of errors and suggest fixes for the proximate or underlying issues in 91% of the runs.
  Figure 2: Success rate for ChatDBG for fixing bugs in unmanaged C/C++ programs, demonstrating effectiveness at proposing root cause fixes.

Implications and Future Work

The integration of LLMs with traditional debugging proves beneficial in expanding the debuggers' capabilities beyond simple runtime inspections. ChatDBG demonstrates potential for automated program repair by leveraging vast data present in LLMs, proposing fixes even for bugs in code previously unseen by developers.

Future work may focus on enhancing fault localization, integrating delta debugging, and leveraging time-travel debugging capabilities to facilitate state exploration over time. Additionally, further experiments with more diverse and complex bugs could refine interaction models between the debugging assistant and developers.

Conclusion

ChatDBG represents a significant advancement in debugging technologies, highlighting how AI can transform conventional debugging processes into powerful, interactive, and efficient dialogues. This research contributes to the field by demonstrating the practical applications of LLMs in tasks traditionally requiring manual labor and expertise.

Markdown