- The paper introduces SentinelAgent, a formal security framework that enforces authority narrowing, intent preservation, and compliance across multi-agent delegation chains.
- It employs a Delegation Chain Calculus (DCC) with a three-layer protocol (pre-, at-, and post-execution) that combines deterministic checks with probabilistic intent verification.
- Empirical evaluations demonstrate 100% attack detection with negligible latency, highlighting its practical viability for high-assurance federal AI deployments.
Overview and Motivation
The SentinelAgent framework introduces a rigorous model and enforcement mechanism for securing delegation chains in federal multi-agent AI systems, where agents autonomously invoke actions across organizational, trust, and infrastructure boundaries. Traditional conversational AI, which is primarily text-generative, presents a comparatively simpler attack surface. In contrast, agentic AI systems operate in an environment where agents can execute API calls, manage sensitive data, and autonomously make critical decisions, significantly increasing the risk profile. Operational environments within the federal domain are especially concerning due to their stringent compliance requirements (e.g., NIST 800-53, Privacy Act) and pronounced adversarial interest.
Empirical evidence underpinning this research includes a recent surge in AI-enabled attacks and documented multi-stage agentic attacks in the wild, such as Anthropic's GTG-1002 campaign, indicating that status quo access management and compliance mechanisms are insufficient. Furthermore, most existing tooling either applies single-agent controls, lacks formal compositionality across delegation, or fails to account for the full spectrum of threat vectors. SentinelAgent addresses the resulting "delegation accountability gap," introducing a formal framework with strong enforcement guarantees for authority, intent, compliance, forensics, and containment across arbitrarily deep agentic delegation chains.
At the core of SentinelAgent is the Delegation Chain Calculus (DCC), which models delegation as an ordered chain of cryptographically signed tokens, each representing a delegation step from one agent to another. Each token encapsulates authority scope, intent as a vectorized natural language goal, policy constraints, cryptographic hash linkage for chain continuity, and deterministic expiry.
DCC defines and enforces seven critical properties:
- Authority Narrowing (P1): Authority attenuation is monotonic; no agent can delegate more authority than it received (σi+1​⊆σi​). Enforced by DAS at token issuance, this blocks classical privilege escalation.
- Intent Preservation (P2): Chain-wide entailment between the root intent and all subtasks, enforced via a three-layer protocol (keyword filter, context-enriched NLI, benign override heuristics). P2 is intentionally probabilistic due to demonstrated theoretical and empirical infeasibility of deterministic semantic intent verification.
- Policy Conjunction Preservation (P3): NIST 800-53 and other compliance controls are preserved across delegations (π0​⊆πj​). Policy set union is enforced at each cross-organizational step.
- Forensic Reconstructibility (P4): Every action is traceable via O(n) hash chain traversal; all tokens and delegations are tamper-evident.
- Cascade Containment (P5): Adversarial blast radius is formally bounded as a function of tool risk tier and heartbeat interval, supporting synchronous, cached, and audit-tiered containment.
- Scope-Action Conformance (P6): API calls are precisely enforced at runtime via manifest whitelists; only actions explicitly permitted by the narrowed scope can be executed.
- Output Schema Conformance (P7): Post-execution outputs are validated against default-deny type-tag schemas; only permitted output types per scope are allowed.
Properties P1, P3–P7 are deterministic and exhaustively model-checked via TLA+ across 2.7 million states with zero observed violations. P2 is probabilistic, bounding attack detection by fine-tuned model performance and explicitly characterizing errors due to the "adversarial intent paraphrasing" problem.
Protocol and Enforcement: Intent-Preserving Delegation Protocol
SentinelAgent employs the Intent-Preserving Delegation Protocol (IPDP), centered on a Delegation Authority Service (DAS). The DAS—designed to be robust against prompt injection and LLM-based compromise—acts as a gatekeeper for all delegation events. Enforcement is structured as a three-point lifecycle:
- Pre-execution (P2): Intent verification, leveraging safety keywords, context-enriched NLI classification, and benign-override heuristics. Standalone, P2 achieves 88.3% attack TPR after domain-specific fine-tuning; in adversarial paraphrasing settings, this drops to 13%.
- At-execution (P6): DATascope enforcement proxied at the API boundary ensures that only permitted actions (as refined by scope narrowing and runtime manifest) are invocable.
- Post-execution (P7): Output Validator checks type tags against schema; non-conformant or suspicious outputs are blocked.
The comprehensive defense-in-depth strategy ensures that if any single property is evaded (e.g., P2 via paraphrasing), subsequent enforcement layers constrain adversarial success to a strictly bounded envelope.
Evaluation and Empirical Results
Experiments deployed SentinelAgent on DelegationBench v4, encompassing 516 scenarios (150 adversarial across 10 attack categories; 366 benign, covering 13 domains). Major results:
- The full three-layer verification pipeline (P2+P6+P7) achieves 100% attack TPR with 0% benign FPR.
- Deterministic properties (P1, P3–P7) remained unbreakable under all attack conditions, including black-box and protocol-level adversaries.
- DAS prototype demonstrates low operational overhead: 8.9 ms token issuance latency; <1 ms per-step runtime enforcement (for P6/P7); overall ∼20 ms per delegation lifecycle (dominated by NLI inference).
- Red-team and live LLM agent evaluation confirm robustness in real-world settings, including composed multi-agent attacks and LLM-generated delegations.
Meta-theoretic analysis exhaustively constructs and validates all 126 non-trivial property-evasion combinations, proving property minimality (removal of any property leads to a concrete attack), graceful degradation (bounded damage on partial evasion), defense-in-depth completeness (all partial combinations constrain the adversary), and composition safety (chains sharing mutable state remain secure with write-impact re-verification).
Theoretical and Practical Implications
SentinelAgent explicitly addresses two longstanding security challenges in agentic systems:
- Delegation Chain Accountability: By formally modeling the complete lineage of actions and intent preservation through multiple agent hops, SentinelAgent solves the forensic and compliance audit challenges for federal and enterprise settings. Practical deployment can support post-hoc investigation, real-time containment, and transparent audit for regulatory standards (20 NIST 800-53 controls across 9 families are mapped to directly enforced properties).
- Semantic Intent Verification Limits: Through theoretical analogy (Rice's theorem) and empirical ambiguity analysis, the work soundly demonstrates the infeasibility of deterministic intent entailment in natural language delegation. The architecture thus strategically relegates P2 to a probabilistic, defense-in-depth role, while ensuring that the remaining deterministic properties strictly police the action and output surfaces accessible to agents.
Deployment of SentinelAgent enables strong guarantees that, as long as DAS integrity holds (supported by fault-tolerant, multi-party signing and HSM-based safeguards), adversarial agents are fundamentally constrained—even when advanced techniques such as paraphrased malicious intent, output manipulation, or protocol-spanning attacks are employed.
Future Directions
Open challenges identified include extending the DCC to formalize citizen consent for Privacy Act compliance, closing the gap on agent collusion via implicit channels, and scaling write-impact notification for high-concurrency systems. Limitations in semantic output validation (for content within permitted output types) remain an open research area, as does optimizing administrative and operational overhead in manifest/schema maintenance.
The protocol's architectural principles are transport-agnostic and suited for integration with major agentic frameworks (e.g., LangChain, CrewAI). Per-step performance overhead is negligible with CPU-only inference, suggesting feasibility at federal-scale workloads.
Conclusion
SentinelAgent provides a comprehensive, formally-grounded, and operationally viable solution for secure delegation in multi-agent AI systems, especially in high-assurance federal contexts. Its layered guarantees tightly bind authority, intent, compliance, forensics, and containment across arbitrarily complex agentic workflows. Critical theoretical insights into the limits of deterministic intent verification are paired with practical, high-accuracy enforcement mechanisms that, in aggregate, yield empirical and model-checked high-assurance security. While future work is required to fully address issues such as consent modeling, implicit collusion, and semantic output validation, SentinelAgent closes the foundational gap for controlling, auditing, and securing autonomous delegation in the most demanding AI deployment environments.