What You See Is Not Always What You Get: An Empirical Study of Code Comprehension by Large Language Models

Published 11 Dec 2024 in cs.SE, cs.AI, and cs.LG | (2412.08098v2)

Abstract: Recent studies have demonstrated outstanding capabilities of LLMs in software engineering tasks, including code generation and comprehension. While LLMs have shown significant potential in assisting with coding, it is perceived that LLMs are vulnerable to adversarial attacks. In this paper, we investigate the vulnerability of LLMs to imperceptible attacks, where hidden character manipulation in source code misleads LLMs' behaviour while remaining undetectable to human reviewers. We devise these attacks into four distinct categories and analyse their impacts on code analysis and comprehension tasks. These four types of imperceptible coding character attacks include coding reordering, invisible coding characters, code deletions, and code homoglyphs. To comprehensively benchmark the robustness of current LLMs solutions against the attacks, we present a systematic experimental evaluation on multiple state-of-the-art LLMs. Our experimental design introduces two key performance metrics, namely model confidence using log probabilities of response, and the response correctness. A set of controlled experiments are conducted using a large-scale perturbed and unperturbed code snippets as the primary prompt input. Our findings confirm the susceptibility of LLMs to imperceptible coding character attacks, while different LLMs present different negative correlations between perturbation magnitude and performance. These results highlight the urgent need for robust LLMs capable of manoeuvring behaviours under imperceptible adversarial conditions. We anticipate this work provides valuable insights for enhancing the security and trustworthiness of LLMs in software engineering applications.

Abstract PDF Upgrade to Chat

Summary

The paper empirically studies how imperceptible character manipulations, like Unicode attacks, affect the code comprehension capabilities of different ChatGPT models (GPT-3.5 and GPT-4).
GPT-3.5 models show a linear performance drop with increased perturbations, while GPT-4's performance degrades rapidly with any perturbation type but appears less sensitive to the budget.
Specific attack types like deletions and reorderings cause the most significant performance declines, highlighting critical vulnerabilities for LLMs used in software engineering tasks.

An Empirical Study of Code Comprehension Vulnerabilities in LLMs

The study in this paper addresses the vulnerabilities of LLMs in software engineering contexts, particularly focusing on adversarial attacks that leverage imperceptible character manipulation. As LLMs have become integral in assisting software developers with tasks spanning code generation, program repair, and vulnerability detection, understanding their robustness against sophisticated attack vectors is crucial. Despite their adeptness at handling natural language tasks, including those required for code comprehension, LLMs have exhibited susceptibility to subtle perturbations. This paper's contribution lies in empirically assessing how truly imperceptible perturbations, encoded via special Unicode characters, affect LLMs' performance, specifically targeting three versions of ChatGPT—two from the third generation and one from the fourth generation.

The research findings demonstrate that the GPT-3.5 models exhibit a strong negative linear correlation between perturbation budget and performance outcomes, such as model confidence and correctness. In contrast, the GPT-4 model, while still negatively impacted by perturbations, reveals a distinct response pattern. The presence of any perturbation rapidly degrades performance outcomes without clearly delineating between perturbation budgets or categories. This discrepancy suggests that GPT-4 may possess inherent mechanisms that more rigidly handle confounding prompts, thereby avoiding false positives. This distinction emphasizes an advancement in security layers within the GPT-4 architecture, although at a cost of handling legitimate yet complex inputs less flexibly.

The study also categorizes perturbations into four types: reordering, invisible characters, deletions, and homoglyphs, each affecting models to varying extents. Notably, the deletions and reorderings categories showed the most significant performance declines, highlighting specific vulnerabilities in current model architectures.

From a theoretical perspective, these findings underscore the importance of enhancing LLMs' understanding mechanisms, especially in discerning imperceptible perturbations that mimic authentic input. Practically, the research outputs are invaluable for developers looking to implement LLMs in software engineering environments, ensuring a cognizance of potential security vulnerabilities.

Future research should focus on developing LLMs that retain accuracy and task efficacy despite the presence of such perturbations. Introducing sophisticated context-parsing algorithms that simulate human intuition when handling corrupted inputs might bridge the gap between appearance and understanding. Moreover, exploration of LLM interpretability mechanisms could aid in developing models that explicate their reasoning processes, further aligning model output with user expectations.

In sum, this paper advances our understanding of LLMs, such as ChatGPT, in handling code-related tasks under adversarial conditions. It calls for continued investigation into refining LLMs to ensure security, reliability, and seamless integration into the workflows they are designed to augment. The exploration of imperceptible character attacks paves the way for more resilient AI systems capable of withstanding increasingly sophisticated threats in a vast array of applications.