- The paper introduces a bilevel optimization framework that coordinates prompt injections and poisoned text to boost attack success and stealth in RAG systems.
- It demonstrates robust performance across LLM architectures, achieving high attack success rates even with minimal poisoned text injections.
- The framework maintains normal response accuracy during non-trigger states, highlighting significant implications for reinforcing AI security defenses.
Coordinated Prompt-RAG Attacks on LLMs
Introduction
The integration of Retrieval-Augmented Generation (RAG) with LLMs has become increasingly prevalent due to its potential to address the inherent limitations of LLMs, such as hallucinations and outdated information. The innovative mechanism of RAG combines LLMs with external retrieval systems, enhancing the capability to provide accurate and up-to-date responses. However, this integration introduces new security vulnerabilities that adversaries could exploit.
Concept and Implementation of PR-Attack
The paper introduces the coordinated Prompt-RAG (PR) attack framework, a novel approach that leverages a bilevel optimization process to enhance the attack effectiveness against RAG-based LLMs. The PR-attack paradigm stems from recognizing that existing methods suffer from inefficiencies when limited poisoned texts can be injected into the knowledge database. These methods also struggle with poor stealth due to detectable anomalies and rely on heuristic-based solutions that lack formal optimization techniques.
The PR-attack method formulates the attack as a bilevel optimization problem involving two critical components: the optimization of prompt injections and the generation of poisoned texts to be injected into the knowledge base. The proposed attack process is structured as follows:
- Poisoned Text Injection: A limited number of strategically crafted poisoned texts are injected into the knowledge database.
- Trigger Activation: A backdoor trigger is embedded within the prompt to initiate the LLM’s response deviation during sensitive periods without affecting normal operations.
- Stealth Maintenance: The design ensures that during normal periods, the backdoor trigger remains inactive, preventing users from realizing the compromise.
Figure 1: Overview of the proposed PR-attack, illustrating the stealth mechanism employed during both sensitive and normal periods.
Bilevel Optimization Framework
The core of the PR-attack methodology is in the bilevel optimization framework. The upper-level optimization focuses on ensuring that when the backdoor trigger is activated, the LLM generates the maliciously intended target answer. Simultaneously, when the trigger is inactive, the LLM should produce the correct answer to maintain high stealth.
The lower-level optimization addresses the retrieval problem, where the task is to select the top-k most relevant texts from the poisoned knowledge base for each target query. The intricate design of this framework facilitates highly effective and stealthy attack mechanisms.
Experimental Evaluation
Extensive experiments demonstrate the superiority of the PR-attack across various datasets and LLM architectures, surpassing existing methods like PoisonedRAG and GGPP in terms of both attack success rate (ASR) and stealth. The PR-attack achieves formidable ASRs even when constrained to a minimal number of poisoned texts, showcasing its robust applicability and effectiveness.


Figure 2: Performance comparison of PR-attack against state-of-the-art methods, demonstrating elevated ASR across multiple LLMs and datasets.
The assessment also highlights that PR-attack maintains an ACC comparable to non-poisoned scenarios when the trigger remains inactive, further reinforcing the argument for its advanced stealth capabilities.


Figure 3: Illustration of PR-attack's stealth effectiveness, maintaining high ACC when the trigger is not activated across tested LLMs.
Sensitivity Analysis
In-depth analysis indicates that the performance of the PR-attack remains stable across variations in parameters such as the length of poisoned texts (b) and the number of tunable tokens in soft prompt (n). This robustness empowers the PR-attack with flexibility and reliability across changing configurations and deployment scenarios in real-world applications.





Figure 4: Examining the impact of variations in the length of poisoned text b on PR-attack performance.




Figure 5: Exploring how changes in the number of tunable tokens n influence the efficacy of PR-attack.
Conclusion
The introduction of the PR-attack method, with its novel bilevel optimization framework, enhances the execution of coordinated attacks on RAG-based LLMs, presenting an elevated threat level in adversarial settings. The framework’s capacity for indirect activation and deactivation contributes significantly to stealth, a key element for evading detection in deployed systems. As AI continues to integrate more deeply into diverse domains, the implications of these findings necessitate a concerted focus on developing more robust defenses against such sophisticated attack vectors. The PR-attack highlights critical vulnerabilities in RAG settings that require immediate attention from researchers and practitioners aiming to secure AI implementations against adversarial threats.