Position: A Three-Layer Probabilistic Assume-Guarantee Architecture Is Structurally Required for Safe LLM Agent Deployment

Published 18 May 2026 in cs.AI | (2605.18672v1)

Abstract: This position paper argues that enforcing LLM agent safety within a single abstraction layer is not merely suboptimal but categorically insufficient for deployed LLM agents -- a structural consequence of how agent execution works, not a contingent limitation of current systems. The three dimensions that jointly constitute safe operation -- semantic intent and policy compliance, environmental validity, and dynamical feasibility -- each depend on a strictly distinct set of information that becomes available at different stages of execution. No single guardrail can certify all three. We argue that the community must respond with a contract-based architecture in which each safety dimension is enforced by an independently certified layer whose probabilistic guarantee satisfies the next layer's assumption. We sketch such an architecture and derive the compositional system-level safety bounds it admits via the chain rule of probability. Three open problems stand between this and a deployable standard: bound estimation from non-i.i.d.\ traces, graceful degradation of contracts under deployment drift, and extension to multi-agent settings -- the most important unfinished business in LLM agent runtime assurance.

Abstract PDF Upgrade to Chat

Authors (9)

Summary

The paper presents a formal argument that single-layer safety mechanisms are inadequate for LLM agent deployment.
It introduces a three-layer architecture that separately enforces semantic, operational, and dynamical safety using probabilistic assume-guarantee contracts.
Empirical evaluations and numerical bounds highlight that independent safety layers significantly enhance overall system reliability.

Structural Necessity of a Three-Layer Probabilistic Assume-Guarantee Architecture for Safe LLM Agent Deployment

Introduction

The paper "Position: A Three-Layer Probabilistic Assume-Guarantee Architecture Is Structurally Required for Safe LLM Agent Deployment" (2605.18672) presents a formal structural argument asserting that single-layer safety enforcement mechanisms are categorically inadequate for the deployment of LLM-based agents in safety-critical settings. The crux is that agent execution exposes three heterogeneous safety dimensions—semantic intent and policy compliance, environmental validity, and dynamical feasibility—each predicated on orthogonal and temporally distinct information sets. The authors articulate that any architectural design failing to enforce all three guarantees at independent layers leaves intrinsic vulnerabilities, irrespective of how systematically engineered the guardrails or runtime filters might be. This position is defended using a contract-based design framework with quantifiable probabilistic safety bounds, and accompanied by a discussion of theoretical and practical challenges that must be resolved for operationalization.

Motivation and Problem Analysis

The deployment of LLM agents in real-world or safety-critical environments diverges sharply from classical software or static LLMs due to the non-deterministic, closed-loop nature of agent execution. LLM agents translate linguistic instructions into action plans, interact with dynamic environments, and potentially affect physical systems, thereby amplifying risks associated with hallucinations, distributional instability, prompt injection, and adversarial manipulation [LLMtrustworthySurvey].

Empirical results underscore the inadequacy of current approaches: none of sixteen popular LLM agents evaluated on AgentSafetyBench [zhang2024agentsafetybench] achieve safety scores above 60%, with environmental and behavioral safety lagging content safety by significant margins. Furthermore, attack success rates on Agent Security Bench frequently exceed 84%. These observations are consistent with the structural thesis: failures predominantly arise in dimensions unreachable by semantic-only (single-layer) controls.

Existing frameworks such as shield agents, SMT-constrained generation, and code-level guardrails (e.g., AgentSpec, Agent-C, ShieldAgent [wang2025agentspeccustomizableruntimeenforcement, kamath2025enforcingtemporalconstraintsllm, chen2025shieldagent]) have made advances, but remain confined to semantic-layer enforcement and fail to address operational and dynamical feasibility.

Structural Argument and Architectural Prescription

The central claim is logically strong: safe LLM agent deployment is structurally impossible under any single-layer enforcement design, independent of implementation methodology. The three safety dimensions become verifiable only at strictly distinct stages:

Semantic (User Assurance): Intent alignment, policy and ethical compliance, and user authorization—verifiable before world observation.
Operational (ODD Verification): Validity of the current world state (within the Operational Design Domain)—verifiable only after sensor data and world state estimation.
Dynamical (Functional Assurance): Execution feasibility and real-time safety constraints—verifiable only during actuation and control-loop execution.

This temporal and informational separation induces a strict causal ordering, formalized via measurable predicates over sub- $\sigma$ -algebras that partition available information across stages. The impossibility of collapse into fewer than three layers is proven: any such architecture either leaves safety dimensions uncertified, breaks assume-guarantee contract chains, or reconstructs the requisite boundaries in an un-auditable manner (Appendix~B).

Probabilistic Assume-Guarantee Contracts and Compositional Safety Bounds

To operationalize modular safety, the authors advocate for probabilistic assume-guarantee (A/G) contracts [delahaye2011probabilistic, hampus2024probabilistic]. Each layer carries a contract $(A_i, \Gamma_i)$ : when inputs satisfy assumption $A_i$ , guarantee $\Gamma_i$ holds with probability $p_i$ . The contracts sequentially compose: the guarantee of layer $i$ satisfies assumption $A_{i+1}$ .

The system-level safety probability admits four formal bounds:

Fréchet–Bonferroni bound (\ref{eq:B1}): $\Pr(\text{safe}) \geq \max(0,\, p_U + p_O + p_F - 2)$
Pairwise co-failure adjustment (\ref{eq:B2}): incorporates $\Pr(F_i \cap F_j)$
Inclusion–exclusion formula (\ref{eq:B3}): $1 - \Pr(F_U \cup F_O \cup F_F)$
Chain-rule decomposition (\ref{eq:B4}): $(A_i, \Gamma_i)$ 0

Conditionals $(A_i, \Gamma_i)$ 1 are both deployment-dependent and independently estimable if upstream filtering is effective. No two-stage collapse architecture yields a valid or independently estimable chain-rule decomposition (Appendix~B, Prop.~2), further cementing the necessity of the three-layer structure.

Illustrative numerical results anchored to empirical evaluations [chen2025shieldagent, wang2025agentspeccustomizableruntimeenforcement, lei2026offtopiceval, khan2025safer, mestres2025probabilistic, urrea2026probabilistic] show compositional gains: with realistic layer probabilities (User $(A_i, \Gamma_i)$ 2, Operational $(A_i, \Gamma_i)$ 3, Functional $(A_i, \Gamma_i)$ 4), system-level safety bound increases markedly under effective upstream filtering compared to naive independence, and a single weak layer induces irrecoverable degradation in all bounds (Appendix~C).

Technical Layer Descriptions

User Assurance Layer ( $(A_i, \Gamma_i)$ 5): Validates plans for compliance with intent, policies, ethics, and user authorization. Generates quantitative bounds and exclusion zones passed downstream. Open challenges include automated formalization of intent and policy into verifiable constraints.
Operational Assurance Layer ( $(A_i, \Gamma_i)$ 6): Checks world-state membership in certified ODDs, determines autonomy envelope, and triggers fallback on invalid or degraded ODD conditions. Deterministic verdicts are essential for auditability. Open issues include robust certification under distributional shift.
Functional Assurance Layer ( $(A_i, \Gamma_i)$ 7): Enforces real-time dynamical safety via specification-based monitoring, quadratic CBF projection, and simulation-based envelope synthesis. Reports safety signals upward, triggering plan recomputation or human fallback. Real-time adaptation and feedback integration remain open.

Each layer emits safety signals and compositional constraints, forming a live bidirectional assurance loop.

Limitations and Open Problems

The approach is scoped to single-agent systems; multi-agent deployments introduce new unsafe modes—cross-agent belief manipulation, collusion, and unverified provenance [hu2026lying, wang2026cot, cemri2025mast]. Extension to multi-agent settings requires validation over a fourth information domain (inter-agent provenance), coupled with explicit treatment of co-failure correlations.

The probabilistic bounds require robust estimation from non-i.i.d. execution traces, which classical PAC theory does not directly supply due to LLM non-determinism and shared backbone correlations [lotfi2024tokens, mohri2009, mohri2010, barber2023, davidov2026]. Marginal estimation is feasible via martingale or mixing-process approaches; tight conditional bounds remain an open frontier.

Graceful degradation under assumption violations, semantic drift, or environmental distribution shift (deployment drift) necessitates well-founded robust contract semantics—current advances include robust temporal logics and certified radii for token-level robustness [DonzeMaler10, FraenzleHansen05, robey2025smoothllm, wang2025clucert], but quantitative robustness guarantees for semantic plan-level violation remain an open direction.

Alternative Views and Implications

Learned end-to-end safety (RLHF, fine-tuning) and PAC-style model abstractions remain complementary but fail to provide compositional certification across heterogeneous layers. Boolean guarantees are theoretically preferable but unattainable under LLM and environmental non-determinism. Latency and optimality trade-offs exist but are unaddressed in this framework; empirical results suggest practical manageability but lack theoretical treatment.

The implications are profound for the engineering of AI safety architectures: this paper rigorously demonstrates that absence of structural separation and probabilistic contract chaining leaves intrinsic vulnerabilities and invalidates compositional safety certification. A modular layered approach, as prescribed, is not merely sufficient but minimal and structurally required.

Practically, adoption requires parallel advances in contract estimation, robustness, drift detection, and multi-agent provenance verification. The theoretical foundation established here prioritizes these directions as requisite infrastructure for trustworthy LLM agent deployment, akin to the trajectory in aviation and automotive safety engineering.

Conclusion

This paper rigorously resolves the structural question underlying runtime safety for LLM agents: semantic, operational, and dynamical safety are fundamentally non-unifiable absent three-layer enforcement, dictated by temporal and informational causal separation. The probabilistic assume-guarantee architecture delivers formally composable safety bounds and establishes a minimum viable structure for deployment in safety-critical environments (2605.18672). Future progress requires closing open estimation gaps, robust contract semantics, and multi-agent extension. The field must therefore prioritize theoretical compositionality as the substrate for practical agent safety, lest empirical patchwork solutions leave systemic vulnerabilities unaddressed.

Markdown Report Issue