Authenticated Workflows: A Systems Approach to Protecting Agentic AI
Abstract: Agentic AI systems automate enterprise workflows but existing defenses--guardrails, semantic filters--are probabilistic and routinely bypassed. We introduce authenticated workflows, the first complete trust layer for enterprise agentic AI. Security reduces to protecting four fundamental boundaries: prompts, tools, data, and context. We enforce intent (operations satisfy organizational policies) and integrity (operations are cryptographically authentic) at every boundary crossing, combining cryptographic elimination of attack classes with runtime policy enforcement. This delivers deterministic security--operations either carry valid cryptographic proof or are rejected. We introduce MAPL, an AI-native policy language that expresses agentic constraints dynamically as agents evolve and invocation context changes, scaling as O(log M + N) policies versus O(M x N) rules through hierarchical composition with cryptographic attestations for workflow dependencies. We prove practicality through a universal security runtime integrating nine leading frameworks (MCP, A2A, OpenAI, Claude, LangChain, CrewAI, AutoGen, LlamaIndex, Haystack) through thin adapters requiring zero protocol modifications. Formal proofs establish completeness and soundness. Empirical validation shows 100% recall with zero false positives across 174 test cases, protection against 9 of 10 OWASP Top 10 risks, and complete mitigation of two high impact production CVEs.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Knowledge Gaps
Knowledge gaps, limitations, and open questions
Below is a focused list of concrete gaps and open questions that the paper leaves unresolved, intended to guide future research and engineering work.
- Formal proofs and artifacts
- The paper cites Lemmas/Theorems (1–7) but does not provide formal models, assumptions, or machine-checked proof artifacts; how to reproduce and validate soundness/completeness claims?
- The “four boundaries are complete and minimal” claim lacks a formal adversarial model covering non-traditional channels (e.g., environment variables, filesystem side-effects, event buses, OS signals); does the minimality proof include these?
- Key and identity lifecycle
- How are agent/tool keypairs generated, stored (HSMs/TEEs), rotated, and revoked in production? What are recovery procedures after key compromise?
- How quickly can revocations propagate across distributed PEPs, and how is consistency ensured during network partitions?
- How is multi-tenant isolation enforced in the registry and key infrastructure?
- Enforcement integrity (L3) assumptions
- PEP integrity depends on “immutable frameworks” and instrumented APIs; what prevents runtime monkey-patching, dynamic plugin loading, or reflective calls from bypassing PEPs in interpreted languages?
- How to secure PEPs against local privilege escalation, process injection, or supply-chain compromise of adapters/verifiers (which are administrator code)?
- What guarantees exist when one endpoint cannot be wrapped (e.g., third-party SaaS APIs), or when only one-sided wrapping is feasible?
- External ecosystem and inter-organizational workflows
- How to extend authenticated workflows to external SaaS tools and LLM APIs that cannot be instrumented or accept signed invocations?
- How are identities, policies, and attestations federated across organizational boundaries (e.g., B2B workflows)? What trust and revocation mechanisms apply across orgs?
- Availability and DoS
- The focus is on integrity/authorization; what protections address availability/DoS (e.g., signature verification at every call, registry lookups, verifier execution) and their impact under load or adversarial traffic?
- What happens when the control plane (registry, policy store, logging) is unavailable or partitioned? Is there a safe degraded mode and its security implications?
- Performance and scalability
- Sub-millisecond verification is claimed, but there are no microbenchmarks for high-throughput/low-latency settings or end-to-end overhead across long-running, multi-hop workflows.
- Effects on LLM throughput/latency and cost at scale (e.g., tens of thousands of tool calls per session) remain unquantified; what caching strategies and batching help without weakening guarantees?
- TOCTTOU and race conditions
- How does the design address time-of-check-to-time-of-use between Stage 3 policy evaluation and operation execution, especially for mutable resources or shared state?
- What concurrency controls exist for attestations in parallel workflows to prevent reordering or partial-order ambiguity?
- Attestation trust and semantics
- How to prevent forged or misleading attestations from compromised services (A3)? Are TEEs, remote attestation, quorum attestations, or cross-checks supported?
- How are attestations revoked or invalidated (e.g., if prerequisite steps are later found flawed), and how are downstream dependencies re-evaluated?
- What is the expressiveness limit for temporal constraints (e.g., time-bounded, conditional, or partially ordered dependencies) beyond simple “A before B” attestations?
- MAPL expressiveness and operational constraints
- Some constraints require Turing-complete or dataflow-dependent logic (e.g., context-sensitive sanitization); when do custom verifiers become necessary, and how is their correctness assured?
- The “no overrides” policy simplifies proofs but may hinder break-glass scenarios; can time-bounded emergency policies be safely automated and audited at scale?
- How are policy conflicts diagnosed and resolved (e.g., deadlock/liveness issues from intersecting denials), and what tooling supports debugging?
- Policy engineering at scale
- The O(log M + N) claim is theoretical; what is the empirical policy count and administrative burden in large enterprises with frequent org changes and dynamic agent creation?
- How are policies versioned, rolled out, and rolled back without breaking running workflows? Is there transactional update support and consistency guarantees?
- What safeguards mitigate risks of policy misconfiguration (which could create either over-permissive or over-restrictive behavior)?
- Coverage gaps in threat surface
- The evaluation claims protection against 9/10 OWASP LLM Top 10 risks; which risk remains unmitigated and why? What roadmap addresses it?
- The scope excludes application-level safety (e.g., detecting malicious code or prompt semantics); how should practitioners combine this system with semantic defenses without brittle heuristics?
- Confidentiality and covert channels
- The paper emphasizes integrity/authorization; how are confidentiality risks addressed (e.g., exfiltration via allowed operations, covert channels through tool outputs, membership inference via LLMs)?
- Can policies express information flow constraints (e.g., non-interference properties) or require declassification steps with cryptographic proofs?
- Audit and privacy
- Non-repudiable audit logs may include sensitive content; how are logs minimized, redacted, encrypted, and access-controlled to meet privacy regulations (e.g., GDPR right to erasure)?
- Are there mechanisms for selective disclosure or zero-knowledge proofs to satisfy auditors without leaking sensitive data?
- Adapter ecosystem and maintainability
- Adapters are “thin” (200–500 LOC), but how are they maintained across fast-moving framework updates and API changes? Is there a standardization effort to avoid adapter drift?
- What certification or verification process ensures adapter correctness and prevents backdoors?
- Partial adoption and legacy systems
- What security guarantees hold when only some boundaries are instrumented (e.g., S2 tools protected but S3 data retrieval unprotected)? Is there a graded assurance model?
- How can legacy systems without modifiable interfaces be integrated (e.g., via network gateways or proxies) without breaking determinism?
- Generalization of empirical results
- The “100% recall/0% false positives” results are limited to 174 test cases; how representative are these of real-world deployments and adversaries? Is there a public benchmark suite?
- How do custom verifiers (which may be heuristic) affect precision/recall in practice, and how are their false positives controlled and measured?
- Multi-agent dynamics and liveness
- Intersection semantics ensure monotonic restriction, but what guarantees exist for liveness (i.e., that legitimate workflows can complete) in deeply nested multi-agent delegations?
- How to reason about emergent harms from components that each act within policy but collectively cause unsafe outcomes (policy compositionality vs. system-level safety)?
- Cross-framework compositions and edges
- How are non-HTTP transports, event-driven systems, and message queues authenticated and verified within this model?
- Are there canonical bindings for common protocols (gRPC, WebSockets, Kafka) that preserve end-to-end guarantees without excessive overhead?
- Governance and organizational process
- Who owns policy authoring and approval (security vs. application teams)? How is separation of duties enforced and audited?
- What human-in-the-loop controls exist for sensitive actions, and how are they authenticated and attested without undermining automation?
- Supply-chain and model integrity
- The design assumes cryptographic hardness but does not address compromised model weights, fine-tuning artifacts, or data poisoning that alter LLM behavior within allowed operations.
- How is the integrity of custom verifiers and PEP binaries ensured (e.g., reproducible builds, code signing, SLSA levels)?
- Edge/embedded deployment
- How does the approach perform on constrained or offline environments (mobile, on-prem IoT) where registry access and frequent key verification are costly or intermittent?
- Standards and interoperability
- Is there a plan to standardize invocation formats, policy schemas, and attestation structures to foster ecosystem adoption beyond the nine frameworks?
- Diagnostics and developer experience
- What tools help developers understand “why” an operation was denied (policy diffing, trace visualization) and suggest minimal policy changes without breaking guarantees?
- Can MAPL policies be statically analyzed, type-checked, or formally verified to prevent unsafe patterns before deployment?
- Future-proofing cryptography
- How will the system migrate to post-quantum cryptography, and what is the impact on performance and key management during hybrid transitions?
These gaps suggest concrete research and engineering directions: formalizing and open-sourcing proofs and benchmarks; specifying standardized invocation/attestation schemas; building robust key/identity lifecycle tooling; developing liveness-aware policy analysis; establishing verifiable adapters and governance processes; and integrating confidentiality and information-flow controls alongside the presented integrity-focused framework.
Practical Applications
Immediate Applications
The following applications can be deployed now using the paper’s authenticated workflows, MAPL policy language, and the universal security runtime with thin adapters across nine frameworks.
- Enterprise AI trust layer for agentic systems
- What: Deploy the universal security runtime with Policy Enforcement Points (PEPs) across LangChain, CrewAI, AutoGen, LlamaIndex, Haystack, MCP, and LLM APIs (OpenAI, Claude) via thin adapters to protect prompts, tools, data, and context.
- Sectors: Software/IT, enterprise platforms
- Tools/products/workflows: PEP SDKs, MAPL policy authoring tools, agent/tool registries, audit-log services using hash chains/Merkle trees, admin dashboards
- Assumptions/dependencies: Enterprise IAM integration, key management (per agent/tool), trusted control plane (policy store, registry), adherence to L3 (enforcement integrity)
- Prompt-injection resilience in RAG pipelines
- What: Enforce document access policies and signed retrieval; treat data as untrusted; verify signed invocations and authenticated context to prevent data-triggered tool misuse.
- Sectors: Knowledge management, customer support, internal search
- Tools/products/workflows: LlamaIndex/Haystack adapters, StorageIntegrityVerifier, MAPL constraints on retrieval sources and parameters
- Assumptions/dependencies: Integration with vector DBs/document stores, correct configuration of resource/parameter constraints
- Secure tool invocation and least privilege for agents
- What: Apply MAPL policies and ToolAuthorizationVerifier to bound filesystem access, command execution, DB queries, and email sending; independently verify at each tool boundary.
- Sectors: Finance (ops automation), IT operations, back-office process automation
- Tools/products/workflows: Per-tool identities/keys, RBAC with MAPL, deny/allow lists for sensitive operations
- Assumptions/dependencies: Accurate resource modeling, parameter-level controls (e.g., path patterns, recipient allowlists)
- Audit-ready, non-repudiable AI operations
- What: Generate tamper-evident logs with hash chains; sign both invocations and results to enable forensic analysis and compliance reporting.
- Sectors: Compliance, risk, legal
- Tools/products/workflows: Audit-log services, signature verification pipelines, exportable evidence packages for SOC 2/HIPAA/GDPR controls
- Assumptions/dependencies: Secure log storage, proper time-stamping, clear retention policies
- Regulated data handling via attestations
- What: Enforce “export only after anonymization attestation exists” (or DLP checks completed) using MAPL attestation dependencies with cryptographic proofs.
- Sectors: Healthcare (PHI/PII), public sector, HR
- Tools/products/workflows: WorkflowIntegrityVerifier, PII detection verifiers, anonymization tools with signed completion attestations
- Assumptions/dependencies: Correct verifier configuration; acceptance of cryptographic attestations in internal compliance processes
- DevOps and cloud automation hardening
- What: Require signed, policy-bound invocations for infra operations (e.g., IaC changes, deployments); enforce command and API call constraints with independent verification at each boundary.
- Sectors: Software/cloud, platform engineering
- Tools/products/workflows: AutoGen code-execution wrappers, CI/CD adapters, geofencing/rate-limiting verifiers
- Assumptions/dependencies: Integration with cloud APIs, accurate resource/parameter policies, secure key storage
- Safer email assistants and communications tools
- What: Wrap send_email and messaging tools with PEPs; constrain recipients, domains, content types, and attachment sources; require signed operations.
- Sectors: Enterprise productivity, daily life
- Tools/products/workflows: Email tool adapters, MAPL recipient allowlists/denials, content scanning verifiers
- Assumptions/dependencies: Email API integration, policy maintenance for authorized recipients and headers
- Browser/scraper agent hardening against malicious content
- What: Treat page content as untrusted data; ensure tool actions (credential access, downloads, command execution) require signed, policy-bound invocations; mitigate Atlas-style prompt injection cascades.
- Sectors: Marketing intelligence, competitive analysis, data aggregation
- Tools/products/workflows: HTTP client wrappers with PEPs, content sanitization verifiers, strict tool parameter controls
- Assumptions/dependencies: Coverage of high-risk tools (credentials, file system), handling of dynamic content and redirects
- Scoped delegation and cross-team collaboration
- What: Use A2A-style signed delegation tokens with MAPL intersection semantics to ensure delegates cannot gain broader permissions than their grant.
- Sectors: Enterprise collaboration, IT governance
- Tools/products/workflows: Delegation token service, policy intersection workflows, revocation mechanisms
- Assumptions/dependencies: Registry and token lifecycle management, organizational hierarchy reflected in MAPL
- Context integrity in multi-turn sessions
- What: Apply authenticated context with hash chains, sequence numbers, and tamper-evident session state across LangChain/CrewAI/AutoGen memory.
- Sectors: Software/IT, customer support, sales assistants
- Tools/products/workflows: MemoryIntegrityVerifier, context signing, policy-bound memory operations
- Assumptions/dependencies: Integration with orchestration memory APIs, performance tuning for sub-ms overhead
- Security red-teaming and OWASP LLM Top 10 coverage
- What: Use the runtime’s deterministic enforcement to test agent applications against OWASP LLM risks; leverage empirical results (9/10 risks mitigated) and plug verifiers for coverage.
- Sectors: Cybersecurity, QA
- Tools/products/workflows: Red-teaming harnesses, risk-specific verifiers (path traversal, exfil prevention, workflow hijacking), reporting dashboards
- Assumptions/dependencies: Test case libraries, controlled staging environments
- Policy management at scale
- What: Replace O(M×N) rule sprawl with MAPL’s hierarchical O(log M + N) policies; use inheritance and intersection to enforce monotonic restriction and transitive denial.
- Sectors: IT governance, platform teams
- Tools/products/workflows: MAPL compiler/validator, org hierarchy importers, policy provenance and diff tooling
- Assumptions/dependencies: Well-maintained org hierarchies; careful use of extends chains and deny patterns
Long-Term Applications
The following applications are promising but require further research, ecosystem scaling, protocol standardization, or vendor cooperation before broad deployment.
- Cross-vendor standardization of authenticated workflows and MAPL-like policies
- What: Establish open standards for agent identities, signed invocations, attestations, and intersection-based policy semantics across A2A/MCP/LLM APIs.
- Sectors: Software, policy
- Tools/products/workflows: RFCs/specs, interoperability test suites, certification programs
- Assumptions/dependencies: Multi-vendor alignment, standards bodies involvement, reference implementations
- Hardware-backed keys and secure enclaves for agent/tool identities
- What: Bind agent/tool keys to TPM/HSM/TEE/FIDO devices for stronger compromise resistance and regulated environment compliance.
- Sectors: Healthcare, finance, energy, defense
- Tools/products/workflows: Hardware key provisioning, attested execution, secure key rotation
- Assumptions/dependencies: Device support, supply chain readiness, FIPS/CC certifications
- Federated, cross-organization agent ecosystems
- What: Enable inter-company authenticated workflows with mutual policy intersection, delegation tokens, and transitive attestations for supply chain automation.
- Sectors: Logistics, finance (trade finance), manufacturing
- Tools/products/workflows: Federation registries, cross-org policy negotiation, legal/compliance overlays
- Assumptions/dependencies: Legal agreements (data-sharing, liability), interoperable trust layer adoption
- Certified marketplaces for agent tools and verifiers
- What: Create an ecosystem where tools/verifiers ship with cryptographic attestations and security profiles; enterprises choose certified components.
- Sectors: Software/platforms
- Tools/products/workflows: Marketplace portals, certification criteria, continuous attestation pipelines
- Assumptions/dependencies: Certification authorities, ongoing security audits, vulnerability disclosure processes
- Cloud-native managed trust layers
- What: Offer authenticated workflow enforcement as a managed service (PEP-as-a-Service) embedded in LLM providers and cloud platforms.
- Sectors: Cloud, SaaS
- Tools/products/workflows: Managed registries, policy stores, observability/incident tooling
- Assumptions/dependencies: Provider support, SLAs for sub-ms verification, multitenancy isolation
- Regulatory incorporation of cryptographically attested AI operations
- What: Update compliance frameworks (HIPAA, GDPR, SOC 2, PCI) to explicitly accept signed invocations/attestations and tamper-evident logs as controls.
- Sectors: Policy, compliance
- Tools/products/workflows: Regulator guidance, audit templates, evidence exporters
- Assumptions/dependencies: Regulator engagement, standards alignment, industry proofs-of-concept
- Authenticated command pipelines for robotics and industrial automation
- What: Require signed, policy-bound commands and attested execution order for robots/PLC/SCADA systems to prevent unsafe operations and escalation.
- Sectors: Robotics, manufacturing, energy
- Tools/products/workflows: Real-time PEPs, safety verifiers (area/force limits), attested maintenance workflows
- Assumptions/dependencies: Deterministic latency guarantees, integration with legacy controllers
- Smart grid and critical infrastructure protection
- What: Deploy authenticated workflows for grid control, telemetry retrieval, and incident response with non-repudiation and strict policy constraints.
- Sectors: Energy, utilities
- Tools/products/workflows: Grid control PEPs, geofencing and rate-limit policies, emergency access groups with time-bounded validity
- Assumptions/dependencies: Vendor cooperation, resilience under outages, secure failover
- Consumer-grade personal assistants with household policies
- What: Enforce signed, policy-bound actions across IoT devices (locks, thermostats, payments) to prevent unsafe or unauthorized assistant behavior.
- Sectors: Consumer/IoT, daily life
- Tools/products/workflows: Home agent OS, per-device identities, parent/guardian policy templates
- Assumptions/dependencies: Device ecosystem support, simple policy authoring UX, recovery from compromised keys
- Open academic testbeds for deterministic multi-agent security
- What: Provide research platforms with authenticated workflows, MAPL, and verifiers for studying compositional attacks and formal security guarantees.
- Sectors: Academia
- Tools/products/workflows: Open-source runtimes, attack libraries, coursework materials
- Assumptions/dependencies: Funding, community maintenance, standardized datasets/scenarios
- End-to-end attested knowledge ecosystems
- What: Track cryptographic provenance from content ingestion through RAG retrieval to tool actions, enabling trustworthy knowledge pipelines.
- Sectors: Education, enterprise knowledge, media
- Tools/products/workflows: Content signing, retrieval policies, provenance-aware UIs/reporting
- Assumptions/dependencies: Content-owner cooperation, signing infrastructure, performance trade-offs
- Automated policy synthesis and drift detection
- What: Use program analysis or LLM-assisted tooling to generate MAPL policies from workflows, detect drift, and suggest least-privilege updates.
- Sectors: Software/IT governance
- Tools/products/workflows: Policy synthesis engines, explainable diffs, verification sandboxes
- Assumptions/dependencies: Reliable model-guided synthesis, human-in-the-loop validation, safety guarantees against over-permissive outputs
Collections
Sign up for free to add this paper to one or more collections.