Adversarial Math Word Problem Generation

Published 27 Feb 2024 in cs.CL and cs.AI | (2402.17916v3)

Abstract: LLMs have significantly transformed the educational landscape. As current plagiarism detection tools struggle to keep pace with LLMs' rapid advancements, the educational community faces the challenge of assessing students' true problem-solving abilities in the presence of LLMs. In this work, we explore a new paradigm for ensuring fair evaluation -- generating adversarial examples which preserve the structure and difficulty of the original questions aimed for assessment, but are unsolvable by LLMs. Focusing on the domain of math word problems, we leverage abstract syntax trees to structurally generate adversarial examples that cause LLMs to produce incorrect answers by simply editing the numeric values in the problems. We conduct experiments on various open- and closed-source LLMs, quantitatively and qualitatively demonstrating that our method significantly degrades their math problem-solving ability. We identify shared vulnerabilities among LLMs and propose a cost-effective approach to attack high-cost models. Additionally, we conduct automatic analysis to investigate the cause of failure, providing further insights into the limitations of LLMs.

Abstract PDF HTML Upgrade to Chat

References (31)

Summary

The paper introduces an adversarial attack that transforms math word problems into Python ASTs, systematically altering numeric values.
The method enforces constraints like positivity and integer preservation to maintain the original problem's structure and difficulty.
Experiments reveal significant accuracy drops in LLMs, with up to a 100% attack success rate on weaker models and a 60 ASR point improvement over rephrasing methods.

Adversarial Math Word Problem Generation

Abstract and Introduction

The paper addresses the challenge of ensuring fair assessment of students' problem-solving abilities in the presence of LLMs by exploring a method to generate adversarial math word problems (MWPs) that are unsolvable by LLMs, yet maintain the original problems' structure and difficulty. This approach leverages abstract syntax trees (ASTs) to systematically alter numeric values in problems, revealing shared vulnerabilities among LLMs and proposing a cost-effective strategy to exploit high-cost models.

Methodology

Problem Transformation and Adversarial Generation

The proposed method converts solvable MWPs into Python code, subsequently into AST representations. This transformation facilitates the generation of adversarial examples by altering numeric values under constraints to retain problem difficulty.

Figure 1: Method Overview Given a MWP that an LLM can correctly solve, our method first transforms it into Python code... Despite this, we find that the resulting adversarial examples cause LLMs to predict incorrect answers.

Code to AST Conversion: The Python-generated solution is transformed into an AST, where variable nodes represent numeric values from the problem. Adversarial examples are generated by modifying these nodes under a set of constraints.
Constraints Implementation: Boolean constraints such as positivity, integer type, and proper fraction preservation are employed to ensure adversarial examples maintain logical problem structure.

Results

Efficacy on Various LLMs

Experiments conducted on eight different LLMs, including GPT-4 and MetaMath 70B, demonstrated a significant attack success rate (ASR). Under the most restrictive generation method, M3, weaker models like Llama 2 13B consistently produced incorrect answers.

Figure 2: Human Evaluation (Left)... Transferability (Right)...

The study highlights a stark accuracy drop across all models when faced with adversarial examples, with Mistral 7B and CodeLlama 34B showcasing a 100% ASR. This suggests a profound vulnerability in LLMs’ numeric reasoning capabilities.

Baseline Comparison: The proposed method surpasses previous rephrasing attacks in degrading model performance by over 60 ASR points on average.
Efficient Targeting of High-Cost Models: The attack on GPT-4 achieved similar ASR while reducing API request calls by up to 90% using adversarial examples from cheaper models, indicating a practical approach to resource-intensive scenarios.

Analysis and Discussion

Human Evaluation and Feature Impact

The human evaluation confirmed the coherence and difficulty preservation in adversarial examples generated by M3. Regression analysis further identified LLM performance reliance on specific numerical ranges and operation complexity.

Transferability of Attacks

The commonality in vulnerabilities among weaker models was evident, with consistent attack transferability observed. This suggests universal weaknesses that accentuate LLMs' limitations in arithmetic problem-solving, underscoring potential areas for robustness improvement.

Conclusion

The paper presents a novel adversarial attack methodology to stress-test LLMs' mathematical capabilities, ensuring educational integrity through robust problem generation. This approach not only exposes inherent limitations in current AI models but also informs future developments aimed at enhancing model robustness against adversarial influences. Further work should explore adapting these methods for more complex problem types, potentially aligning LLM capabilities with human-like reasoning standards in educational settings.

Markdown