Causality Laundering: Denial-Feedback Leakage in Tool-Calling LLM Agents

Published 5 Apr 2026 in cs.CR and cs.AI | (2604.04035v1)

Abstract: Tool-calling LLM agents can read private data, invoke external services, and trigger real-world actions, creating a security problem at the point of tool execution. We identify a denial-feedback leakage pattern, which we term causality laundering, in which an adversary probes a protected action, learns from the denial outcome, and exfiltrates the inferred information through a later seemingly benign tool call. This attack is not captured by flat provenance tracking alone because the leaked information arises from causal influence of the denied action, not direct data flow. We present the Agentic Reference Monitor (ARM), a runtime enforcement layer that mediates every tool invocation by consulting a provenance graph over tool calls, returned data, field-level provenance, and denied actions. ARM propagates trust through an integrity lattice and augments the graph with counterfactual edges from denied-action nodes, enabling enforcement over both transitive data dependencies and denial-induced causal influence. In a controlled evaluation on three representative attack scenarios, ARM blocks causality laundering, transitive taint propagation, and mixed-provenance field misuse that a flat provenance baseline misses, while adding sub-millisecond policy evaluation overhead. These results suggest that denial-aware causal provenance is a useful abstraction for securing tool-calling agent systems.

Abstract PDF Upgrade to Chat

Authors (1)

Mohammad Hossein Chinaei

Summary

The paper introduces and rigorously defines causality laundering as a denial-feedback attack vector exploiting tool invocation denials.
It presents ARM, a denial-aware provenance enforcement system that tracks denied actions and uses counterfactual analysis to block implicit leakage.
Empirical evaluations show ARM effectively blocks attacks missed by flat provenance tracking with enforcement latency under 1 ms per tool call.

Denial-Feedback Information Flows in Tool-Calling LLM Agents: Formalization and Mitigation

Context and Problem Statement

Tool-calling LLM agents increasingly mediate sensitive interactions—reading private records, invoking third-party services, and orchestrating real-world effects. Existing security paradigms surrounding these agents generally focus enforcement on direct data flows: tracking and controlling information that is returned by tools and subsequently used by the agent. However, this approach omits denial-induced feedback channels, which arise when agents observe that a particular tool invocation is blocked (denied) by an enforcement layer. This paper characterizes a previously under-explored attack vector—causality laundering—in which an attacker exploits the agent’s observations of denials to exfiltrate or infer sensitive system properties in subsequent tool calls. The authors formalize this phenomenon and demonstrate that flat provenance tracking is inadequate for defending against this class of attacks.

Causality Laundering: Attack Taxonomy and Formalization

The paper introduces and rigorously defines causality laundering: an adversary-controlled probe triggers a protected tool call, observes that it is denied (and possibly the denial’s reason), and synthesizes follow-up tool calls whose parameters encode information derived from the denial. Crucially, no direct data flows from the denied resource to the exfiltration channel; rather, the leakage passes implicitly through the denial event itself and the agent’s internal state. Three major attack variants are delineated:

Denial inference exfiltration: Single denial outcome infers a sensitive fact (e.g., resource existence).
Multi-probe fingerprinting: The agent aggregates patterns over multiple denies, enabling adversarial inference of system policy or asset structure.
Laundered composition: Denial-induced signals are “laundered” through benign tool calls or intermediate computations, obfuscating the direct causal path.

The authors formalize a causality laundering attack in terms of the provenance graph, asserting that a downstream, observable and privileged action is causally influenced by a prior denial, with no explicit successful data-flow connection.

Limitations of Existing Defenses

The paper reviews related techniques and demonstrates their insufficiency for denial-feedback leakage:

Flat taint and information-flow systems (e.g., FIDES) model only explicit flows; denials are non-events with respect to provenance, so selective exfiltration via denial signals is invisible.
Graph-based dependency tracking (e.g., PCAS) only connects successful tool calls—not attempted-but-denied actions—to downstream effects.
Causal attribution via replay or ablation (e.g., AgentSentry, CausalArmor) is oriented toward detecting untrusted content injections but does not treat enforcement-generated denial events as possible sources of downstream influence.

A central claim is that none of these models surface denied actions as first-class events with potential for downstream causality, leading to an unenforced attack surface.

ARM: Denial-Aware Provenance and Enforcement Design

To mitigate this class of attacks, the paper proposes the Agentic Reference Monitor (ARM)—a deterministic, runtime enforcement layer that explicitly tracks denied actions in a session-level provenance graph. Key architectural choices and features include:

Complete boundary mediation: ARM interposes on all tool invocations and responses, presenting itself as an MCP proxy.
Provenance graph extension: Enriches standard provenance graphs with DeniedAction nodes and Counterfactual edges linking any subsequent tool call temporally following a denial, conservatively marking these as potentially influenced.
Trust propagation via an integrity lattice: All tool call inputs (including derived field-level values and denial-induced contexts) transitively propagate trust, with the minimum trust dominating.
Layered enforcement pipeline: Rigid staged checks, including unconditional boundaries, provenance-based trust and counterfactual queries, auto-generated schema constraints, and immutable operator-specified capability tokens.
Tamper-evident audit: Hash-chained logs for enforcement events ensure forensic verifiability.

ARM's graph-aware Layer 2 (L2G) implements enforcement based on both explicit data dependencies and causality-laundering-induced counterfactual edges. Any tool call reachable, via the graph, from a denied action is subject to denial unless explicit policy overrides are present.

Empirical Evaluation

Three explicit attack scenarios, constructed to stress distinct provenance dimensions, provide a differential test against a flat provenance baseline:

Causality laundering (denial inference exfiltration): Flat taint misses exfiltration via denial; ARM’s counterfactual edge blocks it.
Transitive taint chain: Data is laundered through benign tool calls; only ARM’s reachability analysis detects the attack.
Mixed-provenance field exploit: Distinct trust levels for structured result fields; only ARM’s field-level provenance can distinguish a risky action.

In all cases ARM blocks attacks missed by the flat baseline, with enforcement latency under 1 ms per tool call, demonstrating technical feasibility and an improved security posture.

Implications and Future Challenges

This denial-aware, provenance-native enforcement model represents a marked advance in capturing implicit feedback channels previously ignored in LLM security enforcement. The ARM mechanism generalizes to a wide spectrum of tool-calling agent platforms that embrace boundary mediation, graph-based reasoning, and deterministic runtime enforcement.

However, several empirical and theoretical limitations are acknowledged:

Heuristic false positives: Temporal adjacency is an over-approximation for causal influence; benign post-denial actions may be denied.
Coverage: Scenarios are crafted and limited; benchmark-scale evaluation (e.g., AgentDojo, InjectAgent) is still required for coverage and false positive analysis.
Multi-agent and compositional reasoning: Extension to multi-agent causal chains and delegation is left for future work.
Declassification: Safe and auditable trust upgrades for sanitized values still present an open problem.

ARM is not a solution to all implicit flow or covert channel issues; the practical import is that it enables enforcement over a previously unmodeled and real-world-exploited channel, thus raising the bar for offensive capability in LLM agent deployments.

Theoretical and Practical Impact

From a theoretical perspective, this work aligns security policy enforcement models for LLM agents with classical information-flow and causality literature (Denning’s lattice model, Pearl’s counterfactual causality), expanding the space of enforceable provenance to encapsulate implicit denial-induced flows. Practically, the approach can be adopted across LLM agent infrastructures with centralized mediation, strong audit requirements, and composable policy enforcement.

Conclusion

This paper makes a technically significant contribution by identifying and formalizing denial-feedback leakage (causality laundering) as a concrete operational risk in tool-calling LLM agents. By representing denied tool invocations as first-class provenance events and enforcing downstream policies over counterfactual influence, the ARM architecture closes a critical loophole unaddressed by flat data-flow enforcement. As LLM-driven automation grows in strategic importance, such advances in runtime agent governance will play an increasing role in practical system security and compliance. The approach sets a new standard for what must be tracked and enforced in provenance-driven agent security and provides a clear foundation for the extension of these ideas to more complex agentic systems and workflows.

Reference: "Causality Laundering: Denial-Feedback Leakage in Tool-Calling LLM Agents" (2604.04035)

Markdown Report Issue