Reliable general-purpose defense against prompt injection

Develop a reliable, general-purpose defense against prompt injection that works consistently across diverse large language model and agentic application contexts, and remains effective against adaptive adversaries that actively adjust their attack strategies.

Background

The paper evaluates state-of-the-art prompt injection defenses (PromptShields, Prompt-Guard2, and DataSentinel) and shows that adversarial reward-hacking payloads embedded in telemetry can evade these detectors at high rates. This highlights the lack of a robust, general solution to prompt injection that reliably protects LLM-driven agents across application domains.

Because AIOps operates on structured telemetry rather than unstructured text, the authors argue that general-purpose defenses trained primarily on free-form language may fail to generalize. They introduce AIOpsShield, a domain-specific sanitization approach that leverages the structured, enumerable nature of telemetry to block injection in AIOps. Nevertheless, they explicitly note that in the general case—especially against adaptive adversaries—a comprehensive prompt injection defense remains an open problem.

References

Despite many proposals by the academic community, there is still no (reliable) solution for prompt injection that works consistently in all contexts. In the general case, especially against adaptive adversaries, it continues to be an open problem.

— When AIOps Become "AI Oops": Subverting LLM-driven IT Operations via Telemetry Manipulation (2508.06394 - Pasquini et al., 8 Aug 2025) in Section “Securing AIOps”, Subsection “AIOpsShield”

Reliable general-purpose defense against prompt injection

Background

References

Related Problems