- The paper introduces and rigorously defines causality laundering as a denial-feedback attack vector exploiting tool invocation denials.
- It presents ARM, a denial-aware provenance enforcement system that tracks denied actions and uses counterfactual analysis to block implicit leakage.
- Empirical evaluations show ARM effectively blocks attacks missed by flat provenance tracking with enforcement latency under 1 ms per tool call.
Context and Problem Statement
Tool-calling LLM agents increasingly mediate sensitive interactions—reading private records, invoking third-party services, and orchestrating real-world effects. Existing security paradigms surrounding these agents generally focus enforcement on direct data flows: tracking and controlling information that is returned by tools and subsequently used by the agent. However, this approach omits denial-induced feedback channels, which arise when agents observe that a particular tool invocation is blocked (denied) by an enforcement layer. This paper characterizes a previously under-explored attack vector—causality laundering—in which an attacker exploits the agent’s observations of denials to exfiltrate or infer sensitive system properties in subsequent tool calls. The authors formalize this phenomenon and demonstrate that flat provenance tracking is inadequate for defending against this class of attacks.
The paper introduces and rigorously defines causality laundering: an adversary-controlled probe triggers a protected tool call, observes that it is denied (and possibly the denial’s reason), and synthesizes follow-up tool calls whose parameters encode information derived from the denial. Crucially, no direct data flows from the denied resource to the exfiltration channel; rather, the leakage passes implicitly through the denial event itself and the agent’s internal state. Three major attack variants are delineated:
- Denial inference exfiltration: Single denial outcome infers a sensitive fact (e.g., resource existence).
- Multi-probe fingerprinting: The agent aggregates patterns over multiple denies, enabling adversarial inference of system policy or asset structure.
- Laundered composition: Denial-induced signals are “laundered” through benign tool calls or intermediate computations, obfuscating the direct causal path.
The authors formalize a causality laundering attack in terms of the provenance graph, asserting that a downstream, observable and privileged action is causally influenced by a prior denial, with no explicit successful data-flow connection.
Limitations of Existing Defenses
The paper reviews related techniques and demonstrates their insufficiency for denial-feedback leakage:
- Flat taint and information-flow systems (e.g., FIDES) model only explicit flows; denials are non-events with respect to provenance, so selective exfiltration via denial signals is invisible.
- Graph-based dependency tracking (e.g., PCAS) only connects successful tool calls—not attempted-but-denied actions—to downstream effects.
- Causal attribution via replay or ablation (e.g., AgentSentry, CausalArmor) is oriented toward detecting untrusted content injections but does not treat enforcement-generated denial events as possible sources of downstream influence.
A central claim is that none of these models surface denied actions as first-class events with potential for downstream causality, leading to an unenforced attack surface.
ARM: Denial-Aware Provenance and Enforcement Design
To mitigate this class of attacks, the paper proposes the Agentic Reference Monitor (ARM)—a deterministic, runtime enforcement layer that explicitly tracks denied actions in a session-level provenance graph. Key architectural choices and features include:
- Complete boundary mediation: ARM interposes on all tool invocations and responses, presenting itself as an MCP proxy.
- Provenance graph extension: Enriches standard provenance graphs with DeniedAction nodes and Counterfactual edges linking any subsequent tool call temporally following a denial, conservatively marking these as potentially influenced.
- Trust propagation via an integrity lattice: All tool call inputs (including derived field-level values and denial-induced contexts) transitively propagate trust, with the minimum trust dominating.
- Layered enforcement pipeline: Rigid staged checks, including unconditional boundaries, provenance-based trust and counterfactual queries, auto-generated schema constraints, and immutable operator-specified capability tokens.
- Tamper-evident audit: Hash-chained logs for enforcement events ensure forensic verifiability.
ARM's graph-aware Layer 2 (L2G) implements enforcement based on both explicit data dependencies and causality-laundering-induced counterfactual edges. Any tool call reachable, via the graph, from a denied action is subject to denial unless explicit policy overrides are present.
Empirical Evaluation
Three explicit attack scenarios, constructed to stress distinct provenance dimensions, provide a differential test against a flat provenance baseline:
- Causality laundering (denial inference exfiltration): Flat taint misses exfiltration via denial; ARM’s counterfactual edge blocks it.
- Transitive taint chain: Data is laundered through benign tool calls; only ARM’s reachability analysis detects the attack.
- Mixed-provenance field exploit: Distinct trust levels for structured result fields; only ARM’s field-level provenance can distinguish a risky action.
In all cases ARM blocks attacks missed by the flat baseline, with enforcement latency under 1 ms per tool call, demonstrating technical feasibility and an improved security posture.
Implications and Future Challenges
This denial-aware, provenance-native enforcement model represents a marked advance in capturing implicit feedback channels previously ignored in LLM security enforcement. The ARM mechanism generalizes to a wide spectrum of tool-calling agent platforms that embrace boundary mediation, graph-based reasoning, and deterministic runtime enforcement.
However, several empirical and theoretical limitations are acknowledged:
- Heuristic false positives: Temporal adjacency is an over-approximation for causal influence; benign post-denial actions may be denied.
- Coverage: Scenarios are crafted and limited; benchmark-scale evaluation (e.g., AgentDojo, InjectAgent) is still required for coverage and false positive analysis.
- Multi-agent and compositional reasoning: Extension to multi-agent causal chains and delegation is left for future work.
- Declassification: Safe and auditable trust upgrades for sanitized values still present an open problem.
ARM is not a solution to all implicit flow or covert channel issues; the practical import is that it enables enforcement over a previously unmodeled and real-world-exploited channel, thus raising the bar for offensive capability in LLM agent deployments.
Theoretical and Practical Impact
From a theoretical perspective, this work aligns security policy enforcement models for LLM agents with classical information-flow and causality literature (Denning’s lattice model, Pearl’s counterfactual causality), expanding the space of enforceable provenance to encapsulate implicit denial-induced flows. Practically, the approach can be adopted across LLM agent infrastructures with centralized mediation, strong audit requirements, and composable policy enforcement.
Conclusion
This paper makes a technically significant contribution by identifying and formalizing denial-feedback leakage (causality laundering) as a concrete operational risk in tool-calling LLM agents. By representing denied tool invocations as first-class provenance events and enforcing downstream policies over counterfactual influence, the ARM architecture closes a critical loophole unaddressed by flat data-flow enforcement. As LLM-driven automation grows in strategic importance, such advances in runtime agent governance will play an increasing role in practical system security and compliance. The approach sets a new standard for what must be tracked and enforced in provenance-driven agent security and provides a clear foundation for the extension of these ideas to more complex agentic systems and workflows.
Reference: "Causality Laundering: Denial-Feedback Leakage in Tool-Calling LLM Agents" (2604.04035)