- The paper demonstrates a governance-first architecture, where Arbiter-K separates untrusted LLM outputs from a deterministic symbolic kernel, achieving unsafe interception rates of up to 94.20%.
- The paper introduces a Semantic ISA with five logical cores that enables instruction-level provenance tracking and policy-driven control over agentic computations.
- The paper validates a novel neuro-symbolic taint tracking method and policy feedback loop, substantially reducing token waste and enhancing context reuse during execution.
Governance-First Architecture for Agentic AI: Arbiter-K and Semantic ISA
Motivation and Problem Analysis
The shift from prototype to production in agentic AI exposes systemic fragilities rooted in the orchestration paradigm, where control loops are delegated to LLMs and patched heuristically. This results in inherent vulnerability: untrusted, stochastic inference engines are given privileged authority, leading to non-deterministic trajectories that interact with system state directly. The research demonstrates that text-centric reactive guardrails miss more than 91% of unsafe operations during stateful execution, especially under indirect semantic injection in environments such as AgentDojo and Agent-SafetyBench (Figure 1).
Figure 1: An example of a semantic deviationโa prompt injection leading to an unauthorized tool call.
The abort-and-retry regime in current governance approaches incurs substantial waste. Violations detected late cause full session abandonment, discarding accumulated context and failing to sanitize only the dangerous suffix. As task length increases, token waste is amplified, destabilizing completion rates and imposing a governance tax (Figure 2).

Figure 2: Cumulative governance waste as task length increases, driven by repeated abort-and-retry actions.
Arbiter-K addresses these limitations by treating failed agentic trajectories as analyzable evidence. Policy feedback loops are introduced, converting semantic deviations into actionable constraints, reducing repeated failures and enabling context reuse. This is illustrated by the systemโs capability to extract failure signatures and inform future policy adjustments (Figure 3).
Figure 3: Policy feedback loop: failed trajectories yield evidence for subsequent execution guidance.
System Architecture: From Craft to Kernel
Arbiter-K is defined as a governance-first execution architecture. A strict separation is established between an untrusted Probabilistic Processing Unit (PPU)โthe LLM acting as proposal generatorโand a deterministic Symbolic Kernelโresponsible for enforcing invariants such as schemas, budgets, and permissions (Figure 4).
Figure 4: Architecture of Arbiter-K, showing bifurcation between probabilistic and deterministic security domains.
The kernel intercepts intents before execution, validates against active policies, and maintains a Security Context Registry and Instruction Dependency Graph. This framework enables active taint propagation and fine-grained privilege enforcement, ensuring unsafe trajectories are interdicted pre-sink.
Semantic Instruction Set Architecture
A formal Semantic ISA is presented, partitioned into five logical cores: Cognitive, Memory, Execution, Normative, and Meta-cognitive (Figure 5). Each instruction has explicit operational function and governance properties, categorizing operations as probabilistic proposals or deterministic actions. Instruction bindings are managed via strictly-typed schemas, validating data flow from untrusted reasoning to trusted execution.
Figure 5: The five instruction cores of Arbiter-Kโs Semantic ISA.
Arbiter-Kโs neuro-symbolic ISA enables instruction-level provenance tracking and policy-driven control. For instance, risky operations such as TOOL_BUILD must pass through deterministic verification, and taint-aware propagation is enforced throughout execution flows.
Neuro-Symbolic Taint Tracking
Arbiter-K adapts taint analysis from compilers to agentic AI, tagging data from external sources, local sensitive files, and PPU outputs. Taint is propagated along instruction trajectories; deterministic sinks (e.g., SQL_EXECUTE) are enforced such that tainted data cannot be used unless explicitly sanitized via verification instructions. Global trace recording bifurcates interpretable and kernel-level metadata, supporting architectural recovery and runtime policy refinement (Figure 6).
Figure 6: Taint analysis procedure: data flow, label propagation, enforcement, and sanitization.
Symbolic Policy Engine and Reliability Budget
The kernelโs policy engine embeds three policy categories: global consensus, task-specific constraints, and dynamic trace-driven refinements. Policies are encoded as right-linear grammars or FSMs, enabling static and runtime trajectory validation. To manage governance tax, Arbiter-K introduces Reliability Budgetsโexplicit allocations over computation and residual riskโoptimizing intervention selection according to strategic importance of each execution step.
Empirical Evaluation and Results
Arbiter-K is implemented on OpenClaw and NanoBot, evaluated using benchmarks with 1,914 unsafe and 412 benign cases. Native host policies intercept only 0โ9% of unsafe actions; Arbiter-K achieves 76โ95% blocking rates, with 92.95% interception on OpenClaw and 94.20% on NanoBot. False interception rates are consistently below 6% for benign operations (Figure 7).
Figure 7: Performance profile: Arbiter-K dramatically increases unsafe interception rate compared to native policies.
Instruction coverage is comprehensive, with Arbiter-K semantically interpreting and labeling all observed shell-execution instances, demonstrating robust parser and risk-labeling capabilities (Figure 8).
Figure 8: Arbiter-K instruction coverage across diverse command primitives.
Ablation studies reveal that security gains are primarily attributable to relational and unary semantic policy layers, rather than host-specific rules (Figure 9). Combined semantic policies outperformed host-centric heuristics, emphasizing architectural enforcement.
Figure 9: Policy ablation: semantic policies drive unsafe interception, host rules have limited impact.
Policy feedback allows context reuseโmore than 70% of trajectory tokens are retained when dangerous steps are blocked early, with policy feedback messages costing less than 300 tokens. Arbiter-K achieves consistent early blocking: in 98.33% of runs, unsafe trajectories are intercepted, with median block onset at 50% completion, compared to 23.01% and 80% for native policies.
Practical and Theoretical Implications
Arbiter-K demonstrates that microarchitectural security invariants can be enforced without degrading agentic reasoning or utility. The separation of probabilistic proposal from deterministic privilege control introduces a scalable governance model applicable to diverse agentic workloads. Reifying LLM outputs into granulated instructions enables formal architectural auditing, taint-propagation, and optimized resource allocation.
Practically, Arbiter-Kโs design addresses the pervasive crisis of handcrafted orchestration, enabling a structured migration path for legacy agents. Theoretical implications include new bounds for safety analysis in agentic AI, and a foundation for neuro-symbolic runtime architectures that support formal verification and control in autonomous computational substrates. The modularity of the policy engine permits dynamic adaptation and trace-driven evolution, aligning with trends toward AIOS and multi-agent operating systems.
Future Directions
Anticipated developments include further integration of Arbiter-Kโs symbolic governance into large-scale agent ecosystems, expansion of policy grammars and trace analysis techniques, and application to real-time and embodied AI scenarios. Scaling to multi-agent coordination, cross-session privilege tracking, and formal guarantees over complex environment-altering workflows will be critical. Arbiter-Kโs principles may seed new abstractions for risk reduction, reliability budgeting, and resilient agentic computing across domains.
Conclusion
Arbiter-K introduces a governance-first execution architecture, incorporating a Semantic ISA and deterministic symbolic kernel for agentic AI. By enforcing instruction-level privilege and taint-aware dependency tracking, Arbiter-K substantially increases unsafe interception ratesโdemonstrating architectural efficacy and practical impact on agent security and reliability. The system provides a formal foundation for the separation of stochastic reasoning from deterministic enforcement, enabling trace-driven policy refinement and robust execution control for agentic computers (2604.18652).