- The paper introduces the CIK taxonomy to classify persistent state poisoning in AI agents, showing attack success rates spike from a 24.6% baseline to as high as 89.2%.
- The study experimentally evaluates attacks across four leading LLMs, demonstrating that poisoning any single dimension of Capability, Identity, or Knowledge significantly raises threat levels.
- Results highlight an evolutionโsafety tradeoff, revealing that current defenses can reduce but not eliminate the risks of persistent state modifications without hindering adaptive functionality.
Real-World Safety Analysis of Persistent State Attacks in OpenClaw
Introduction
"Your Agent, Their Asset: A Real-World Safety Analysis of OpenClaw" (2604.04759) presents the first empirical, real-world evaluation of persistent state poisoning threats in personal AI agents, instantiated on OpenClawโa platform with full system privileges and integrations spanning critical resources including Gmail, Stripe, and the local file system. The studyโs core contribution is the CIK taxonomy, which unifies agent vulnerabilities into three failure dimensions: Capability, Identity, and Knowledgeโeach corresponding to a distinct aspect of the persistent agent state. The paper systematically demonstrates that state-poisoning attacks on any CIK dimension dramatically amplify the attack success rate (ASR) against advanced LLM backbones, and further that these vulnerabilities are architectural, not model-specific.
The CIK Taxonomy and Attack Vectors
The CIK model decomposes the persistent, self-evolving state of agentic systems into:
- Capability: Executable skills (script and documentation files) that define agent actions.
- Identity: Persona, configurations, and trust anchors (e.g., who the agent trusts).
- Knowledge: Long-term memories and behavioral records.
The persistent state files are not ephemeralโthey are loaded and further modified automatically across interactions, providing a high-value target for adversaries who achieve even temporary access.
Figure 1: Persistent state in OpenClaw is organized along the CIK axes, each enabling distinct and highly effective poisoning mechanisms, tested across four backbone LLMs.
The attack protocol spans two temporally-separated phases: (1) injection into a target file (persistent poisoning), and (2) a subsequent session in which the malicious state is triggered by an external prompt or event, yielding real-world harm.
Figure 2: Two-phase attack protocolโstate poisoning (phase one) is decoupled from downstream triggering (phase two), which ensures persistence beyond the initial intrusion window.
Experimental Study: Poisoning Efficacy and Defense Shortcomings
The CIK taxonomy is realized concretely on OpenClaw, with direct mappings from each dimension to persistent workspace files. The experimental suite covers 12 real-world harms split across privacy violations and irreversible destructive actions (e.g., credential exfiltration, unauthorized refunds, workspace destruction).
Quantitative experiments were conducted across four leading LLM backbones: Claude Sonnet 4.5, Claude Opus 4.6, Gemini 3.1 Pro, and GPT-5.4. Each was subject to 88 test conditions, encompassing both clean baselines and isolated CIK poisoning attempts.
Results: Structural Vulnerability
Key finding: Poisoning any single CIK dimension increases mean ASR from a baseline of 24.6% to the 64โ74% range. Sonnet 4.5 exhibits ASRs as high as 89.2% post-Knowledge poisoning; even the most robust model (Opus 4.6) more than triples its baseline ASR when poisoned.
These results are robust across both injection (file modification) and trigger (action execution) phases. Notably, executable Capability payloads (e.g., hidden shell scripts) bypass LLM reasoning entirely, resulting in deterministic attack success except for partial resistance in GPT-5.4.
Case Studies: Exploitation of CIK Dimensions
To exemplify the operational mechanics and impact, the study provides concrete poisoning scenarios for each CIK axis:
Figure 3: Concrete attacksโ(Left, Knowledge): Fabricated batch refund โhabitsโ make dangerous financial actions seem routine. (Middle, Identity): A planted backup destination silently redirects credentials. (Right, Capability): Hidden destructive payloads effect irreversible workspace loss.
- Knowledge: False behavioral histories (e.g., in MEMORY.md) cause the agent to treat risky operationsโsuch as batch refundsโas routine and uncontested.
- Identity: Malicious anchors (e.g., a backup URL in USER.md) are accepted as trusted without confirmation, enabling data exfiltration.
- Capability: Executable payloads inserted into skill files are executed silently, enabling total compromise or denial-of-service (e.g.,
rm -rf).
Defense Evaluation and EvolutionโSafety Tradeoff
Three defense strategiesโeach aligned to a CIK dimensionโwere quantitatively evaluated. While defenses reduce baseline ASR, none are sufficient to eliminate attacks within their targeted dimension, and Capability-targeted attacks remain resistant (ASR: 63.8% even under the strongest defense with GuardianClaw loaded pre-session).
A file-protection strategy drastically reduces both attack and legitimate update acceptance rates (~97% reduction in both), surfacing a fundamental evolutionโsafety tradeoff: mechanisms that block attacks also paralyze the agentโs adaptive functionality, which is intrinsic to its value proposition.
Theoretical and Practical Implications
This workโs central implication is that persistent state poisoning in LLM-driven agents is an architectural weakness intrinsic to the โevolution-firstโ paradigm. Improvements in backbone model alignment are insufficient: attacks that exploit out-of-band code paths (Capability) or socially-engineered trust (Identity) persist across even the current model frontier.
The inability of LLMs to reliably differentiate benign from malicious persistent state changes (especially as context size/complexity grows) suggests that prompt/knowledge-level safety interventions will remain incomplete. Instead, systematic architectural mitigationsโruntime code signing, sandboxed skill execution, mandatory human-in-the-loop approval for persistent state changes, and explicit separation of agent memory from executable privilegesโappear essential for robust security.
Discussion and Future Directions
By providing a unified taxonomy and a live-systems evaluation, this study sets a new standard for agentic security benchmarking. The CIK decomposition generalizes to all self-evolving LLM agents, and its demonstration on OpenClaw (with >220,000 internet-exposed instances) (2604.04759) illustrates immediate risk at scale.
Future work should extend to chained, multi-axis attacks, longitudinal naturalistic deployments, automated attack generation, and evaluation of systematic architectural defenses (e.g., OS-level sandboxing as opposed to LLM-based refusals). Production-grade agent deployments integrating CIK-aware safety architectures are required to address these vulnerabilities at ecosystem scale.
Conclusion
This paper demonstrates that persistent state poisoningโacross Capability, Identity, and Knowledgeโis a structural, cross-model threat to all evolution-first LLM-based agents. The current generation of defenses is insufficient, and more systematic, architectural safeguards are indispensable to reconcile personalization with real-world safety. The CIK taxonomy provides a foundation for future evaluation, mitigation, and design of secure agentic systems.