Papers
Topics
Authors
Recent
Search
2000 character limit reached

Sentinel Agents for Secure and Trustworthy Agentic AI in Multi-Agent Systems

Published 18 Sep 2025 in cs.AI and cs.MA | (2509.14956v1)

Abstract: This paper proposes a novel architectural framework aimed at enhancing security and reliability in multi-agent systems (MAS). A central component of this framework is a network of Sentinel Agents, functioning as a distributed security layer that integrates techniques such as semantic analysis via LLMs, behavioral analytics, retrieval-augmented verification, and cross-agent anomaly detection. Such agents can potentially oversee inter-agent communications, identify potential threats, enforce privacy and access controls, and maintain comprehensive audit records. Complementary to the idea of Sentinel Agents is the use of a Coordinator Agent. The Coordinator Agent supervises policy implementation, and manages agent participation. In addition, the Coordinator also ingests alerts from Sentinel Agents. Based on these alerts, it can adapt policies, isolate or quarantine misbehaving agents, and contain threats to maintain the integrity of the MAS ecosystem. This dual-layered security approach, combining the continuous monitoring of Sentinel Agents with the governance functions of Coordinator Agents, supports dynamic and adaptive defense mechanisms against a range of threats, including prompt injection, collusive agent behavior, hallucinations generated by LLMs, privacy breaches, and coordinated multi-agent attacks. In addition to the architectural design, we present a simulation study where 162 synthetic attacks of different families (prompt injection, hallucination, and data exfiltration) were injected into a multi-agent conversational environment. The Sentinel Agents successfully detected the attack attempts, confirming the practical feasibility of the proposed monitoring approach. The framework also offers enhanced system observability, supports regulatory compliance, and enables policy evolution over time.

Summary

  • The paper presents a framework where Sentinel Agents monitor agent interactions to detect prompt injection and malicious behaviors, achieving 100% detection in tests.
  • The methodology utilizes sidecar, LLM proxy, and continuous listener patterns combined with regex and anomaly detection for effective threat mitigation.
  • Experimental evaluations validate the approach, highlighting its potential for scalable security enhancements in multi-agent systems while outlining future research directions.

Sentinel Agents for Secure and Trustworthy Agentic AI in Multi-Agent Systems

The paper "Sentinel Agents for Secure and Trustworthy Agentic AI in Multi-Agent Systems" (2509.14956) introduces a comprehensive architectural framework aimed at enhancing the security and reliability of multi-agent systems (MAS). By proposing the integration of Sentinel Agents, the framework addresses various threats posed in conversational AI environments, ranging from prompt injection to malicious agent behavior and privacy breaches. This essay will outline the Sentinel Agents' role and architecture, explore practical applications, compare this approach with existing frameworks, and conclude with experimental evaluations and future directions.

Sentinel Agent Architecture

Core Concept and Design

Sentinel Agents are conceived as specialized AI components embedded within MAS to monitor and analyze interactions between agents. Their role is to detect anomalies, apply security policies, and identify potential threats in real time. Unlike traditional agent-specific security measures, Sentinel Agents offer a standardized, globally aware layer of protection by separating operational logic from security controls.

Deployment Patterns

The paper outlines several architectural patterns for deploying Sentinel Agents:

  • Sidecar Pattern: A Sentinel Agent operates as a companion process to each main agent, intercepting and inspecting interactions locally. This pattern ensures low latency and fine-grained control but adds overhead for each agent. Figure 1

    Figure 1: Sidecar Pattern for Sentinel Agent

  • LLM Proxy Pattern: A centralized proxy that filters all transactional data, enforcing global policies and maintaining consistency across agents. This pattern is more scalable but can introduce latencies if not optimized. Figure 2

    Figure 2: LLM Proxy Pattern for Sentinel Agent

  • Continuous Listener Pattern: An independent Sentinel Agent observes the entire conversational space without intervening directly, ideal for detecting systemic trends or threats. Figure 3

    Figure 3: Continuous Listener Pattern: Sentinel Agent as an independent, system-wide observer of the Shared Conversational Space.

Comparisons with other protocols like MCP, A2A, and ANP reveal that while those focus on interoperability and direct agent-to-agent communication, Sentinel Agents emphasize overall conversational security and integrity.

Practical Applications

Addressing Threats

  1. Prompt Injection Defense: Sentinel agents employ regex, anomaly detection, and adaptive measure layers to identify and neutralize prompt injection attempts, preventing potential security breaches while allowing prompt evolution based on gathered intelligence.
  2. Malicious Behavior: By analyzing the sequences of messages rather than isolated utterances, Sentinel Agents can detect deceptive patterns, collusion, and other malicious activities, triggering appropriate administrative responses via the Coordinator Agent.
  3. Hallucination Mitigation: Integrating with external factual databases and consensus strategies among agents ensures that erroneous outputs by LLMs are quickly identified and corrected.
  4. Privacy Protections: By enforcing identity concealment and frequent anomaly detection, Sentinel Agents ensure compliance with data protection frameworks, safeguarding personally identifiable information from unauthorized access.

Comparative Analysis

Position within Existing AI Security Frameworks

Sentinel Agents offer a pragmatic solution that effectively transforms theoretical security concepts of major AI risk frameworks (such as NIST AI RMF and OWASP Top 10 for LLMs) into operational protocols. By providing both proactive and reactive security measures, Sentinel Agents complement these frameworks' general guidelines with concrete implementations that align security processes with MAS's specific requirements.

Experimental Evaluation

Proof-of-Concept Study

An experimental prototype involving layered detection capabilities of Sentinel Agents was tested using 162 adversarial prompts. The Sentinel Agent demonstrated a 100\% detection rate, validating the proposed architectural components in identifying diverse adversarial threats. Figure 4

Figure 4: Prompt-injection risk distribution across 110 adversarial attempts.

Limitations and Future Work

Although successful in detecting synthetic adversaries, this evaluation requires refinement. Future work should expand on detailed ablation studies, explore more comprehensive datasets including benign examples, and quantitatively assess false positive rates for robust benchmarking. Figure 5

Figure 5: Detection rate (\%) by attack type, including prompt injection, hallucination, and data exfiltration attempts.

Conclusion

By introducing Sentinel Agents into MAS, this framework supports secure, adaptable, and cross-platform agent interactions essential for modern AI applications. While demonstrating promising results, further research is needed to cement its efficacy in varied real-world contexts. This paper lays a foundational framework for making MAS systems resilient against evolving threats via seamless and consistently evolving trust, governance, and security layers. Future studies should focus on scalability, unit economics, and ethical implementation challenges in diverse industry settings.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Glossary

  • A2A: An agent communication protocol acronym referenced as a comparator; specifics are not elaborated in the paper. "protocols (MCP, A2A, ANP, SLOP)"
  • ANP: An agent communication protocol acronym referenced as a comparator; specifics are not elaborated in the paper. "protocols (MCP, A2A, ANP, SLOP)"
  • AgentFlayer: A security research tool demonstrating agentic prompt-injection risks and data exfiltration. "Zenity Labs’ AgentFlayer illustrates this risk, showing how hidden instructions in benign documents can exfiltrate data without user interaction"
  • Agentic AI: AI systems composed of autonomous agents capable of acting and coordinating to achieve goals. "to establish a robust and adaptive security layer for Agentic AI."
  • AI Gateway: An intermediary service that enforces security, filters traffic, and manages LLM access and routing. "Such a proxy can include components like an API Gateway, an AI Gateway, data processing modules, and response formatting capabilities."
  • AI Observability: Practices and tooling to monitor AI systems’ behavior, performance, and risks in real time. "They can also power AI observability: generating real-time telemetry, detecting performance anomalies, and surfacing bias or bottlenecks."
  • Anomaly detection: Techniques to spot abnormal patterns in behavior or traffic that may indicate threats. "applying anomaly detection and rate-limiting"
  • API Gateway: A gateway layer that manages, secures, and routes API calls to backend services or models. "Such a proxy can include components like an API Gateway, an AI Gateway, data processing modules, and response formatting capabilities."
  • Audit trail: A tamper-resistant log of actions and decisions to support accountability and compliance. "Every action taken by the Sentinel Agent is logged into an audit trail to ensure accountability, transparency, and compliance."
  • Behavioral analytics: Analysis of sequences and patterns of interactions to detect misuse or collusion. "behavioral analytics"
  • Chain-of-Thought (CoT): A prompting strategy that elicits step-by-step reasoning to improve detection and analysis. "prompting strategies like Chain-of-Thought (CoT) and Few-shot classification are commonly employed."
  • Continuous Listener pattern: A deployment where a monitoring agent passively observes all system traffic for anomalies. "The Continuous Listener pattern (i.e. \cite{nist800137,istio_telemetry}), illustrated in Figure~\ref{fig:continuous-listener}, positions a Sentinel Agent as an independent observer"
  • Coordinator Agent: The governance component that disseminates policy, manages agents, and responds to alerts. "The Coordinator Agent supervises policy implementation, and manages agent participation."
  • Convener Agent: A coordinating role that manages turns and shared conversational space in certain frameworks. "managed by a Coordinator Agent called the ``Convener Agent''"
  • Data exfiltration: Unauthorized extraction of sensitive data from a system or agent environment. "(prompt injection, hallucination, and data exfiltration)"
  • External Fact-Checking APIs: Third-party services used to verify factual claims made by agents. "External Fact-Checking APIs"
  • Few-shot classification: Classification that uses a small number of labeled examples in the prompt to guide model behavior. "prompting strategies like Chain-of-Thought (CoT) and Few-shot classification are commonly employed."
  • GDPR: A European regulation governing data protection and privacy. "General Data Protection Regulation (GDPR)"
  • HIPAA: A U.S. law for protecting health information privacy and security. "Health Insurance Portability and Accountability Act (HIPAA)"
  • Hybrid Approach: A combined deployment using sidecar, proxy, and listener patterns for comprehensive defense. "Hybrid Approach"
  • LLM: A neural network trained on large corpora to understand and generate natural language. "LLM hallucinations occur when models produce outputs that appear plausible but lack factual support."
  • LLM Proxy: A centralized or distributed intermediary that filters, secures, and optimizes LLM traffic. "LLM Proxy or AI Gateway pattern"
  • Linda-style shared memory: A coordination model where agents communicate via a shared tuple space. "Tuple Spaces and Linda-style shared memory"
  • Model Context Protocol (MCP): An interoperability standard for connecting tools and agents to models. "Model Context Protocol (MCP)"
  • Multi-Agent Systems (MAS): Systems composed of multiple interacting autonomous agents. "multi-agent systems (MAS)"
  • Open Policy Agent: A policy engine enabling policy-as-code for authorization and governance. "policy-as-code paradigm (e.g., inspired by Open Policy Agent)."
  • OWASP Top 10 for LLMs: A community-curated list of the most critical security risks for LLM systems. "The severity of this issue has been acknowledged by the OWASP Top 10 for LLMs"
  • Personally Identifiable Information (PII): Data that can identify an individual, requiring special handling and protection. "personally identifiable information (PII)"
  • Policy-as-code: Defining and enforcing policies via machine-readable code for consistency and automation. "policy-as-code paradigm (e.g., inspired by Open Policy Agent)."
  • Polyglot environments: Systems involving multiple programming languages within one architecture. "supports heterogeneous programming languages (polyglot environments)"
  • Prompt injection: Crafting adversarial inputs that manipulate model behavior or bypass safeguards. "Prompt injection is widely recognized as a leading security threat in AI systems."
  • Provenance tracking: Recording sources and derivation paths of information to assess trustworthiness. "confidence scoring, and provenance tracking"
  • Quarantine: Isolating compromised or misbehaving agents to contain threats. "isolate or quarantine misbehaving agents"
  • Rate-limiting: Restricting the frequency of requests or actions to mitigate abuse or attacks. "applying anomaly detection and rate-limiting"
  • Retrieval-augmented verification: Validating content by checking claims against external knowledge sources. "retrieval-augmented verification"
  • Secure-by-design principles: Building security controls into architecture and processes from the outset. "This aligns well with secure-by-design principles."
  • Security control plane: The layer that governs, coordinates, and enforces security policies across agents. "This relationship forms a security control plane for MAS."
  • Sentinel Agents: Dedicated monitoring and enforcement agents providing security and observability in MAS. "A central component of this framework is a network of Sentinel Agents"
  • Shared Conversational Space: A common context where agents exchange messages and coordinate actions. "The ``Shared Conversational Space'' concept aligns with emerging architectures in multi-agent orchestration."
  • Sidecar pattern: Co-locating a helper component alongside a primary service for interception and security. "Sidecar Pattern"
  • SLOP: An agent communication protocol acronym referenced as a comparator; specifics are not elaborated in the paper. "protocols (MCP, A2A, ANP, SLOP)"
  • Telemetry: Operational signals emitted by systems to enable monitoring and diagnostics. "generating real-time telemetry"
  • Time-series analysis: Methods for analyzing sequential data over time to detect trends or anomalies. "Time-series analysis plays a central role in this process."
  • Tuple Spaces: A coordination model where agents communicate via inserting and reading tuples in shared space. "Tuple Spaces and Linda-style shared memory"
  • Zero-shot classification: Classification without labeled examples for the specific classes, leveraging model generalization. "Zero-shot classification approaches are especially valuable in dynamic environments"

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Authors (2)

Collections

Sign up for free to add this paper to one or more collections.