QuadSentinel: Modular Guard for LLM Systems

Updated 25 December 2025

QuadSentinel is a modular multi-agent guard architecture that formalizes safety policies using Boolean predicates and sequents.
It employs four specialized agents—State Tracker, Threat Watcher, Policy Verifier, and Referee—to monitor and control inter-agent actions in real time.
Empirical evaluations on benchmarks demonstrate improved accuracy, precision, recall, and runtime efficiency compared to single-agent systems.

QuadSentinel is a modular multi-agent guard architecture for real-time enforcement of formal safety policies in LLM-based (LLM-based) multi-agent systems. It converts ambiguous, context-dependent natural language safety requirements into machine-checkable sequents over Boolean predicates, providing accurate and efficient online control over agent actions and inter-agent messages. The system leverages a four-agent collaboration—State Tracker, Threat Watcher, Policy Verifier, and Referee—each serving a distinct role in policy enforcement. QuadSentinel achieves improved guardrail accuracy, rule recall, and a reduced false positive rate (FPR) over prior single-agent baselines on benchmarks such as ST-WebAgentBench and AgentHarm, all without modifying core agent logic (Yang et al., 18 Dec 2025).

1. Sequent-Based Policy Formalism

QuadSentinel operationalizes safety as a set of formal sequents built over a finite collection of Boolean predicates $\mathcal{P} = \{ p_1, p_2, \dots \}$ that index observable environment state and agent interactions. At each timestep $t$ , the system maintains $\Gamma_t \subseteq \mathcal{P}$ as the set of currently true predicates, which are used to evaluate policy compliance.

Natural-language safety policies (e.g., "Do not publish sensitive info") are compiled offline into propositional logic rules, such as

$\phi \coloneqq \neg(sensitive\_info \wedge publish\_content)$

These are restated as safety obligations $\psi_\phi$ , with sequents $\Gamma_t \vdash \psi_\phi$ defining at runtime whether the current state and interaction satisfy the policy. The implication

$\left( \bigwedge_{p \in \Gamma_t} p \right) \Rightarrow \psi_\phi$

is checked for each active rule; failure triggers an enforcement block. For example, if both $sensitive\_info$ and $publish\_content$ are true in $\Gamma_t$ , the data leakage rule is violated and blocked.

2. Four-Agent Guard Structure

QuadSentinel's guard intercepts all inter-agent messages or actions $t$ 0 and maintains complete control state $t$ 1. The pipeline implements the following specialized subagents:

2.1 State Tracker

The State Tracker maintains $t$ 2 by updating only a candidate set of predicates $t$ 3, where $t$ 4 comprises the top- $t$ 5 predicates retrieved via semantic similarity to $t$ 6, and $t$ 7 are predicates required by static dependencies. Updates are performed by a high-precision LLM, with all non-candidate predicates retaining prior value (frame assumption).

2.2 Threat Watcher

Threat Watcher assigns per-agent risk scores $t$ 8 via exponential decay updates: $t$ 9 Increasing risk dynamically expands the State Tracker's evaluation budget and raises denial thresholds in the Referee module.

2.3 Policy Verifier

The Policy Verifier selects currently active rules $\Gamma_t \subseteq \mathcal{P}$ 0 and evaluates the relevant sequents, returning any violated rules $\Gamma_t \subseteq \mathcal{P}$ 1 along with minimal counterexamples. Upon uncertainty, it triggers additional state expansion.

2.4 Referee

The Referee adjudicates enforcement decisions based on $\Gamma_t \subseteq \mathcal{P}$ 2, $\Gamma_t \subseteq \mathcal{P}$ 3, $\Gamma_t \subseteq \mathcal{P}$ 4, and $\Gamma_t \subseteq \mathcal{P}$ 5, using risk-adaptive thresholds. Hierarchical execution begins with a lightweight LLM pass (e.g., GPT-4o-mini), escalating to a larger LLM for denials or ambiguous cases. The Referee emits allow/deny signals and human-readable rationales.

Guard Pipeline (Algorithm 1)

$\phi \coloneqq \neg(sensitive\_info \wedge publish\_content)$ 9

3. Policy Compilation and Online Enforcement

In the offline phase, powerful LLMs (GPT-4o) parse free-form policy text into predicate sets $\Gamma_t \subseteq \mathcal{P}$ 6 and propositional rules $\Gamma_t \subseteq \mathcal{P}$ 7, with human-in-the-loop review for correctness. An embedding index over predicate schemas (supporting Top- $\Gamma_t \subseteq \mathcal{P}$ 8 retrieval) and a static dependency graph are built to facilitate efficient runtime selection.

At runtime, a Top- $\Gamma_t \subseteq \mathcal{P}$ 9 predicate updater leverages Approximate Nearest Neighbor search (HNSW) to select relevant predicates in $\phi \coloneqq \neg(sensitive\_info \wedge publish\_content)$ 0 time, reducing per-step costs to those of an LLM pass over $\phi \coloneqq \neg(sensitive\_info \wedge publish\_content)$ 1 tokens, given $\phi \coloneqq \neg(sensitive\_info \wedge publish\_content)$ 2 and $\phi \coloneqq \neg(sensitive\_info \wedge publish\_content)$ 3: $\phi \coloneqq \neg(sensitive\_info \wedge publish\_content)$ 4

Referee conflict resolution is governed by severity scores $\phi \coloneqq \neg(sensitive\_info \wedge publish\_content)$ 5; any rule violation with $\phi \coloneqq \neg(sensitive\_info \wedge publish\_content)$ 6 triggers a denial, based on the agent's current risk. Hierarchical LLM escalation reduces false positives while maintaining recall.

4. Evaluation and Empirical Results

QuadSentinel is evaluated on ST-WebAgentBench (256 tasks, 646 policies) and AgentHarm (176 tasks, 11 harm categories), using both AWM and Magentic-One base agents. Baselines for comparison include Prompt-based Guard, ShieldAgent, and GuardAgent. Models employed are GPT-4o / GPT-4o-mini, with ablation studies using Qwen3-235B.

4.1 Main Performance

Guardrail	ST-WebAgentBench: Acc	Prec.	Rec.	FPR	AgentHarm: Acc	Prec.	Rec.	FPR
ShieldAgent	91.1	81.6	74.1	4.4	86.9	95.2	77.7	3.9
GuardAgent	84.0	91.9	74.6	6.6	78.4	93.7	60.9	4.1
Prompt-based	77.9	52.7	84.2	24.2	88.6	94.2	82.4	5.2
QuadSentinel (Ours)	93.6	88.9	84.2	3.4	91.5	97.4	85.2	2.3

QuadSentinel yields superior accuracy, precision, recall, and lower FPR compared to all baselines.

4.2 Runtime Efficiency

Variant	Additional Token	Token Ovhd	Additional Time (s)	Time Ovhd
Base Agent	–	–	10 699	–
QuadSentinel	+3.7 M	0.95×	+3 550	0.33×
ShieldAgent	+7.3 M	1.86×	+6 640	0.62×
GuardAgent	+10.4 M	2.66×	+13 302	1.24×

QuadSentinel minimizes both additional token and time overhead relative to prior approaches.

4.3 Ablation Analysis

Variant	Acc (%)	Prec (%)	Rec (%)	FPR (%)
Full System	91.5	97.4	85.2	2.3
Single Referee	83.0	74.6	100	34.1
No Threat Watcher	86.4	90.0	81.8	9.1
No Top-k Filter	88.9	92.0	85.2	7.4

All four agents are critical for optimal performance and minimal false positives.

Per-category breakdown (Appendix C) confirms that QuadSentinel maintains the best tradeoff between utility and safety across harm categories.

5. Deployment Model and Limitations

QuadSentinel is deployable as a plug-in, wrapping existing multi-agent LLM infrastructures via authenticated message channels. No modifications to agent internals or retraining are required; policies reside in a distinct, cryptographically signed “policy book” loaded at system start.

Quality of policy translation is crucial; human refinement may be needed for ambiguous rules. As components depend on LLMs, they remain susceptible to adversarial prompt attacks, partially mitigated via prompt hardening and sanitization. The reliance on Top- $\phi \coloneqq \neg(sensitive\_info \wedge publish\_content)$ 7 filtering and a closed-world predicate set may fail to capture novel risks not encoded within $\phi \coloneqq \neg(sensitive\_info \wedge publish\_content)$ 8.

The implementation will be released at https://github.com/yyiliu/QuadSentinel.

6. Conclusion

QuadSentinel establishes the first modular multi-agent guard framework for LLM systems, anchoring policy enforcement in formal logic through sequent-based obligations over machine-checkable predicates. Its four-agent design supports stateful, real-time safety with high accuracy, low false positive rates, and transparent rationales, outperforming single-agent baselines such as ShieldAgent. The approach enables practical, robust deployment to arbitrary agent ecosystems without intruding on agent architectures or requiring core modifications (Yang et al., 18 Dec 2025).

Markdown Report Issue Upgrade to Chat

References (1)

QuadSentinel: Sequent Safety for Machine-Checkable Control in Multi-agent Systems (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to QuadSentinel.