Papers
Topics
Authors
Recent
Search
2000 character limit reached

QuadSentinel: Modular Guard for LLM Systems

Updated 25 December 2025
  • QuadSentinel is a modular multi-agent guard architecture that formalizes safety policies using Boolean predicates and sequents.
  • It employs four specialized agents—State Tracker, Threat Watcher, Policy Verifier, and Referee—to monitor and control inter-agent actions in real time.
  • Empirical evaluations on benchmarks demonstrate improved accuracy, precision, recall, and runtime efficiency compared to single-agent systems.

QuadSentinel is a modular multi-agent guard architecture for real-time enforcement of formal safety policies in LLM-based (LLM-based) multi-agent systems. It converts ambiguous, context-dependent natural language safety requirements into machine-checkable sequents over Boolean predicates, providing accurate and efficient online control over agent actions and inter-agent messages. The system leverages a four-agent collaboration—State Tracker, Threat Watcher, Policy Verifier, and Referee—each serving a distinct role in policy enforcement. QuadSentinel achieves improved guardrail accuracy, rule recall, and a reduced false positive rate (FPR) over prior single-agent baselines on benchmarks such as ST-WebAgentBench and AgentHarm, all without modifying core agent logic (Yang et al., 18 Dec 2025).

1. Sequent-Based Policy Formalism

QuadSentinel operationalizes safety as a set of formal sequents built over a finite collection of Boolean predicates P={p1,p2,}\mathcal{P} = \{ p_1, p_2, \dots \} that index observable environment state and agent interactions. At each timestep tt, the system maintains ΓtP\Gamma_t \subseteq \mathcal{P} as the set of currently true predicates, which are used to evaluate policy compliance.

Natural-language safety policies (e.g., "Do not publish sensitive info") are compiled offline into propositional logic rules, such as

ϕ¬(sensitive_infopublish_content)\phi \coloneqq \neg(sensitive\_info \wedge publish\_content)

These are restated as safety obligations ψϕ\psi_\phi, with sequents Γtψϕ\Gamma_t \vdash \psi_\phi defining at runtime whether the current state and interaction satisfy the policy. The implication

(pΓtp)ψϕ\left( \bigwedge_{p \in \Gamma_t} p \right) \Rightarrow \psi_\phi

is checked for each active rule; failure triggers an enforcement block. For example, if both sensitive_infosensitive\_info and publish_contentpublish\_content are true in Γt\Gamma_t, the data leakage rule is violated and blocked.

2. Four-Agent Guard Structure

QuadSentinel's guard intercepts all inter-agent messages or actions tt0 and maintains complete control state tt1. The pipeline implements the following specialized subagents:

2.1 State Tracker

The State Tracker maintains tt2 by updating only a candidate set of predicates tt3, where tt4 comprises the top-tt5 predicates retrieved via semantic similarity to tt6, and tt7 are predicates required by static dependencies. Updates are performed by a high-precision LLM, with all non-candidate predicates retaining prior value (frame assumption).

2.2 Threat Watcher

Threat Watcher assigns per-agent risk scores tt8 via exponential decay updates: tt9 Increasing risk dynamically expands the State Tracker's evaluation budget and raises denial thresholds in the Referee module.

2.3 Policy Verifier

The Policy Verifier selects currently active rules ΓtP\Gamma_t \subseteq \mathcal{P}0 and evaluates the relevant sequents, returning any violated rules ΓtP\Gamma_t \subseteq \mathcal{P}1 along with minimal counterexamples. Upon uncertainty, it triggers additional state expansion.

2.4 Referee

The Referee adjudicates enforcement decisions based on ΓtP\Gamma_t \subseteq \mathcal{P}2, ΓtP\Gamma_t \subseteq \mathcal{P}3, ΓtP\Gamma_t \subseteq \mathcal{P}4, and ΓtP\Gamma_t \subseteq \mathcal{P}5, using risk-adaptive thresholds. Hierarchical execution begins with a lightweight LLM pass (e.g., GPT-4o-mini), escalating to a larger LLM for denials or ambiguous cases. The Referee emits allow/deny signals and human-readable rationales.

Guard Pipeline (Algorithm 1)

ϕ¬(sensitive_infopublish_content)\phi \coloneqq \neg(sensitive\_info \wedge publish\_content)9

3. Policy Compilation and Online Enforcement

In the offline phase, powerful LLMs (GPT-4o) parse free-form policy text into predicate sets ΓtP\Gamma_t \subseteq \mathcal{P}6 and propositional rules ΓtP\Gamma_t \subseteq \mathcal{P}7, with human-in-the-loop review for correctness. An embedding index over predicate schemas (supporting Top-ΓtP\Gamma_t \subseteq \mathcal{P}8 retrieval) and a static dependency graph are built to facilitate efficient runtime selection.

At runtime, a Top-ΓtP\Gamma_t \subseteq \mathcal{P}9 predicate updater leverages Approximate Nearest Neighbor search (HNSW) to select relevant predicates in ϕ¬(sensitive_infopublish_content)\phi \coloneqq \neg(sensitive\_info \wedge publish\_content)0 time, reducing per-step costs to those of an LLM pass over ϕ¬(sensitive_infopublish_content)\phi \coloneqq \neg(sensitive\_info \wedge publish\_content)1 tokens, given ϕ¬(sensitive_infopublish_content)\phi \coloneqq \neg(sensitive\_info \wedge publish\_content)2 and ϕ¬(sensitive_infopublish_content)\phi \coloneqq \neg(sensitive\_info \wedge publish\_content)3: ϕ¬(sensitive_infopublish_content)\phi \coloneqq \neg(sensitive\_info \wedge publish\_content)4

Referee conflict resolution is governed by severity scores ϕ¬(sensitive_infopublish_content)\phi \coloneqq \neg(sensitive\_info \wedge publish\_content)5; any rule violation with ϕ¬(sensitive_infopublish_content)\phi \coloneqq \neg(sensitive\_info \wedge publish\_content)6 triggers a denial, based on the agent's current risk. Hierarchical LLM escalation reduces false positives while maintaining recall.

4. Evaluation and Empirical Results

QuadSentinel is evaluated on ST-WebAgentBench (256 tasks, 646 policies) and AgentHarm (176 tasks, 11 harm categories), using both AWM and Magentic-One base agents. Baselines for comparison include Prompt-based Guard, ShieldAgent, and GuardAgent. Models employed are GPT-4o / GPT-4o-mini, with ablation studies using Qwen3-235B.

4.1 Main Performance

Guardrail ST-WebAgentBench: Acc Prec. Rec. FPR AgentHarm: Acc Prec. Rec. FPR
ShieldAgent 91.1 81.6 74.1 4.4 86.9 95.2 77.7 3.9
GuardAgent 84.0 91.9 74.6 6.6 78.4 93.7 60.9 4.1
Prompt-based 77.9 52.7 84.2 24.2 88.6 94.2 82.4 5.2
QuadSentinel (Ours) 93.6 88.9 84.2 3.4 91.5 97.4 85.2 2.3

QuadSentinel yields superior accuracy, precision, recall, and lower FPR compared to all baselines.

4.2 Runtime Efficiency

Variant Additional Token Token Ovhd Additional Time (s) Time Ovhd
Base Agent 10 699
QuadSentinel +3.7 M 0.95× +3 550 0.33×
ShieldAgent +7.3 M 1.86× +6 640 0.62×
GuardAgent +10.4 M 2.66× +13 302 1.24×

QuadSentinel minimizes both additional token and time overhead relative to prior approaches.

4.3 Ablation Analysis

Variant Acc (%) Prec (%) Rec (%) FPR (%)
Full System 91.5 97.4 85.2 2.3
Single Referee 83.0 74.6 100 34.1
No Threat Watcher 86.4 90.0 81.8 9.1
No Top-k Filter 88.9 92.0 85.2 7.4

All four agents are critical for optimal performance and minimal false positives.

Per-category breakdown (Appendix C) confirms that QuadSentinel maintains the best tradeoff between utility and safety across harm categories.

5. Deployment Model and Limitations

QuadSentinel is deployable as a plug-in, wrapping existing multi-agent LLM infrastructures via authenticated message channels. No modifications to agent internals or retraining are required; policies reside in a distinct, cryptographically signed “policy book” loaded at system start.

Quality of policy translation is crucial; human refinement may be needed for ambiguous rules. As components depend on LLMs, they remain susceptible to adversarial prompt attacks, partially mitigated via prompt hardening and sanitization. The reliance on Top-ϕ¬(sensitive_infopublish_content)\phi \coloneqq \neg(sensitive\_info \wedge publish\_content)7 filtering and a closed-world predicate set may fail to capture novel risks not encoded within ϕ¬(sensitive_infopublish_content)\phi \coloneqq \neg(sensitive\_info \wedge publish\_content)8.

The implementation will be released at https://github.com/yyiliu/QuadSentinel.

6. Conclusion

QuadSentinel establishes the first modular multi-agent guard framework for LLM systems, anchoring policy enforcement in formal logic through sequent-based obligations over machine-checkable predicates. Its four-agent design supports stateful, real-time safety with high accuracy, low false positive rates, and transparent rationales, outperforming single-agent baselines such as ShieldAgent. The approach enables practical, robust deployment to arbitrary agent ecosystems without intruding on agent architectures or requiring core modifications (Yang et al., 18 Dec 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to QuadSentinel.