- The paper proposes GAAP to deterministically enforce user-specified data confidentiality in agent workflows, eliminating reliance on untrusted components.
- It employs static code analysis, taint tracking, and a persistent permission database to mitigate prompt injections and prevent unauthorized data leaks.
- GAAP maintains high task utility with only a minor latency increase, ensuring robust privacy while achieving near state-of-the-art performance.
GAAP: Deterministic Enforcement of User Data Confidentiality in AI Agent Execution
Introduction and Motivation
Agentic AI systems that automate user workflows—ranging from personal productivity to enterprise integration—pose a severe confidentiality risk by mediating access to sensitive user data for tool usage, web automation, and system integration. Control over disclosures to AI model providers and external tool endpoints is especially challenging given susceptibilities to prompt injections, model compromise, and malicious or untrustworthy agent code. "An AI Agent Execution Environment to Safeguard User Data" (2604.19657) proposes GAAP (Guaranteed Accounting for Agent Privacy), an agent execution environment that deterministically enforces user-specified data disclosure policies without trusting user prompts, agent code, or the model provider.
System Architecture and Threat Model
GAAP adopts a security-centric architecture that abstracts data access and disclosure mediation away from the agent and LLM. Data and policy assets, as well as enforcement logic, are isolated into constructively minimal trusted computing bases on the user side. All components of the agent workflow—including the LLM prompt, context, and code generation—are considered untrusted. The solution relies on mediated code artifact execution, augmented Information Flow Control (IFC), persistent disclosure state, fine-grained policy learning, and tool/service annotation.
Figure 1: GAAP’s Architecture enabling deterministic and persistent tracking and enforcement of user-specified disclosure policies throughout agent execution and tool usage.
The system interposes on all interactions between the agent (running LLM-generated code artifacts) and external APIs/MCP servers. All access to private user data is routed through a private data DB, and external disclosures are checked against a persistent, user-modifiable permission DB. GAAP maintains a disclosure log to support longitudinal confidentiality reasoning, enabling transitive taint propagation across tasks and tool invocations, even in multi-shot agent workflows. Granularity is enhanced using an extensible annotation framework for tool output semantics and party associations.
Figure 2: One step of GAAP agent execution, detailing the inline mediation, auditing, and user-driven policy expansion at external data disclosure points.
Enforcement Mechanisms
GAAP relies on classic and extended IFC principles, applied at the code artifact level. Execution proceeds by having the agent generate code to fulfill a task, guided by the user’s prompt and a GAAP system prompt specifying available data and tool schemas. This code is statically analyzed and instrumented for taint tracking: any read from the private data DB or tool API is tagged, and these taints are checked at disclosure sites. No data flows—direct, indirect, or transitive—escape unless previously authorized for the specific (data item, external party) pair.
Persistent policy learning is accomplished dynamically: GAAP interacts with the user to obtain allow/deny decisions at first encounter for each such disclosure event, populating the permission DB and minimizing repeated queries over time. The architecture is agnostic to agent errors and model hallucinations, supporting deterministic, fully automated mediation even if agent actions are adversarially controlled, unlike policy systems that assume either trusted prompts or static predeployment policies.
Evaluation: Security and Utility
Privacy Guarantees
GAAP is evaluated on a purpose-built suite emphasizing fine-grained permission enforcement and prompt injection resilience. Three canonical prompt injection attacks are considered: indiscriminate data leakage (e.g., SSN in every tool call), contextual leakage (phone number inserted into emails), and semantic mislabeling (SSN as phone number). Baselines include naive agents, LLM-judge-based mediation, and static policy or contextual LLM-policy approaches (e.g., Conseca).
Figure 3: Success rates of privacy disclosure attacks. GAAP blocks all tested adversarial data disclosures; other approaches permit significant leakage, especially against adaptive attacks and semantic mislabelings.
GAAP blocks 100% of successful attacks, including adversarially crafted contexts where both "LLM-judge" and Conseca fail—most notably in transitive and semantic swap settings. The study highlights that judge-based or context-limited policy generation is insufficient, as prompt injection and semantic confusion bypass these methods; GAAP's deterministic enforcement fully eliminates such classes of attack.
Task Utility
GAAP's strict dataflow enforcement mechanisms are engineered to maintain agent utility. When benchmarked against non-private and privacy-enforcing agent baselines across AgentDojo and custom tasks, GAAP's success rates (task completion) are within 5% of context-unconstrained agents (e.g., 76% vs 81% in custom benchmarks). Importantly, GAAP outperforms complex LLM-judge and static-context policy-based systems in utility for many tasks, as these baselines tend towards over-blocking or under-blocking due to incomplete context or aggressive adversary adaptations.
Figure 4: Utility of GAAP and baselines. Despite deterministic privacy enforcement, GAAP retains high utility, surpassing or matching alternative guardrail systems.
System Costs
Methodological mediation and persistent policy mechanisms introduce marginal overheads. Median latency for GAAP is 13% higher than a non-private agent. Token costs shift: input context is consistent and compact due to system prompts, but output token usage is higher due to code artifact emission. Aggregated over full task suites, GAAP is overall less costly than non-private approaches due to diminished context churn.
Figure 5: Input and output LLM token costs, showing GAAP’s consistent input overheads but elevated output stemming from code artifact generation.
Figure 6: Average latency overheads for GAAP are limited, supporting practical usage at near-interactive speeds.
User permission query effort amortizes efficiently via the persistent permission DB, with query rates decaying substantially over system lifetime as policies stabilize. The disclosure log and annotation mechanisms synergistically prevent indirect leakage, even after multi-step or multi-user workflows.
Discussion and Implications
GAAP delivers a rigorously specified, user-backed framework for deterministic enforcement of privacy in AI agent ecosystems. It obviates trust in model providers, agent code, or prompt provenance, thereby mitigating prompt injection, agent hallucinations, and adversarial tool flows. The system aligns with adversary-resilient, minimal trust architectures and offers persistent enforcement—crucial for regulatory compliance and user accountability.
From a practical standpoint, GAAP's explicit permission model could support scalable deployment in consumer software, enterprise SaaS automation, and regulated workflows. The annotation ecosystem supports community-driven tool validation. Integrations with private memory systems and persona-driven AI decisions for permissions could further enhance usability. Limitations include initial user burden for permission establishment and the need for robust annotation curation against over/undertainting. Log growth and taint management efficiency will require future work.
Theoretically, GAAP provides a reference paradigm for constructing agentic systems where confidentiality can be formally guaranteed even in adversarial, continuously evolving environments, and paves the way for general-purpose privacy-preserving AI agents with both deterministic enforcement and longitudinal policy tracking.
Conclusion
"An AI Agent Execution Environment to Safeguard User Data" introduces a comprehensive system—GAAP—which deterministically enforces user data confidentiality policies over untrusted agentic workflows. Disclosures are strictly mediated and audited, including across persistent, multi-participant, and multi-shot executions. The implementation achieves strong empirical privacy, practical utility, and manageable overheads, setting a new standard for agent confidentiality enforcement in adversarial contexts (2604.19657).