QuadSentinel: Modular Guard for LLM Systems
- QuadSentinel is a modular multi-agent guard architecture that formalizes safety policies using Boolean predicates and sequents.
- It employs four specialized agents—State Tracker, Threat Watcher, Policy Verifier, and Referee—to monitor and control inter-agent actions in real time.
- Empirical evaluations on benchmarks demonstrate improved accuracy, precision, recall, and runtime efficiency compared to single-agent systems.
QuadSentinel is a modular multi-agent guard architecture for real-time enforcement of formal safety policies in LLM-based (LLM-based) multi-agent systems. It converts ambiguous, context-dependent natural language safety requirements into machine-checkable sequents over Boolean predicates, providing accurate and efficient online control over agent actions and inter-agent messages. The system leverages a four-agent collaboration—State Tracker, Threat Watcher, Policy Verifier, and Referee—each serving a distinct role in policy enforcement. QuadSentinel achieves improved guardrail accuracy, rule recall, and a reduced false positive rate (FPR) over prior single-agent baselines on benchmarks such as ST-WebAgentBench and AgentHarm, all without modifying core agent logic (Yang et al., 18 Dec 2025).
1. Sequent-Based Policy Formalism
QuadSentinel operationalizes safety as a set of formal sequents built over a finite collection of Boolean predicates that index observable environment state and agent interactions. At each timestep , the system maintains as the set of currently true predicates, which are used to evaluate policy compliance.
Natural-language safety policies (e.g., "Do not publish sensitive info") are compiled offline into propositional logic rules, such as
These are restated as safety obligations , with sequents defining at runtime whether the current state and interaction satisfy the policy. The implication
is checked for each active rule; failure triggers an enforcement block. For example, if both and are true in , the data leakage rule is violated and blocked.
2. Four-Agent Guard Structure
QuadSentinel's guard intercepts all inter-agent messages or actions and maintains complete control state . The pipeline implements the following specialized subagents:
2.1 State Tracker
The State Tracker maintains by updating only a candidate set of predicates , where comprises the top- predicates retrieved via semantic similarity to , and are predicates required by static dependencies. Updates are performed by a high-precision LLM, with all non-candidate predicates retaining prior value (frame assumption).
2.2 Threat Watcher
Threat Watcher assigns per-agent risk scores via exponential decay updates: Increasing risk dynamically expands the State Tracker's evaluation budget and raises denial thresholds in the Referee module.
2.3 Policy Verifier
The Policy Verifier selects currently active rules and evaluates the relevant sequents, returning any violated rules along with minimal counterexamples. Upon uncertainty, it triggers additional state expansion.
2.4 Referee
The Referee adjudicates enforcement decisions based on , , , and , using risk-adaptive thresholds. Hierarchical execution begins with a lightweight LLM pass (e.g., GPT-4o-mini), escalating to a larger LLM for denials or ambiguous cases. The Referee emits allow/deny signals and human-readable rationales.
Guard Pipeline (Algorithm 1)
1 2 3 4 5 6 7 8 |
def guard_pipeline(m_t, Theta_prev): C_t = TopK(m_t, P) | MustCheck(m_t) Gamma_t = EvalPredicates(C_t, Gamma_prev) tau_t = UpdateThreat(m_t, tau_prev, Gamma_t) R_app = select_active_rules(m_t) R_viol = {r in R_app | not implies(Gamma_t, psi_r)} d_t, T_t = Referee(m_t, tau_t, Gamma_t, R_viol) return d_t, (P, R, Gamma_t, tau_t), T_t |
3. Policy Compilation and Online Enforcement
In the offline phase, powerful LLMs (GPT-4o) parse free-form policy text into predicate sets and propositional rules , with human-in-the-loop review for correctness. An embedding index over predicate schemas (supporting Top- retrieval) and a static dependency graph are built to facilitate efficient runtime selection.
At runtime, a Top- predicate updater leverages Approximate Nearest Neighbor search (HNSW) to select relevant predicates in time, reducing per-step costs to those of an LLM pass over tokens, given and :
Referee conflict resolution is governed by severity scores ; any rule violation with triggers a denial, based on the agent's current risk. Hierarchical LLM escalation reduces false positives while maintaining recall.
4. Evaluation and Empirical Results
QuadSentinel is evaluated on ST-WebAgentBench (256 tasks, 646 policies) and AgentHarm (176 tasks, 11 harm categories), using both AWM and Magentic-One base agents. Baselines for comparison include Prompt-based Guard, ShieldAgent, and GuardAgent. Models employed are GPT-4o / GPT-4o-mini, with ablation studies using Qwen3-235B.
4.1 Main Performance
| Guardrail | ST-WebAgentBench: Acc | Prec. | Rec. | FPR | AgentHarm: Acc | Prec. | Rec. | FPR |
|---|---|---|---|---|---|---|---|---|
| ShieldAgent | 91.1 | 81.6 | 74.1 | 4.4 | 86.9 | 95.2 | 77.7 | 3.9 |
| GuardAgent | 84.0 | 91.9 | 74.6 | 6.6 | 78.4 | 93.7 | 60.9 | 4.1 |
| Prompt-based | 77.9 | 52.7 | 84.2 | 24.2 | 88.6 | 94.2 | 82.4 | 5.2 |
| QuadSentinel (Ours) | 93.6 | 88.9 | 84.2 | 3.4 | 91.5 | 97.4 | 85.2 | 2.3 |
QuadSentinel yields superior accuracy, precision, recall, and lower FPR compared to all baselines.
4.2 Runtime Efficiency
| Variant | Additional Token | Token Ovhd | Additional Time (s) | Time Ovhd |
|---|---|---|---|---|
| Base Agent | – | – | 10 699 | – |
| QuadSentinel | +3.7 M | 0.95× | +3 550 | 0.33× |
| ShieldAgent | +7.3 M | 1.86× | +6 640 | 0.62× |
| GuardAgent | +10.4 M | 2.66× | +13 302 | 1.24× |
QuadSentinel minimizes both additional token and time overhead relative to prior approaches.
4.3 Ablation Analysis
| Variant | Acc (%) | Prec (%) | Rec (%) | FPR (%) |
|---|---|---|---|---|
| Full System | 91.5 | 97.4 | 85.2 | 2.3 |
| Single Referee | 83.0 | 74.6 | 100 | 34.1 |
| No Threat Watcher | 86.4 | 90.0 | 81.8 | 9.1 |
| No Top-k Filter | 88.9 | 92.0 | 85.2 | 7.4 |
All four agents are critical for optimal performance and minimal false positives.
Per-category breakdown (Appendix C) confirms that QuadSentinel maintains the best tradeoff between utility and safety across harm categories.
5. Deployment Model and Limitations
QuadSentinel is deployable as a plug-in, wrapping existing multi-agent LLM infrastructures via authenticated message channels. No modifications to agent internals or retraining are required; policies reside in a distinct, cryptographically signed “policy book” loaded at system start.
Quality of policy translation is crucial; human refinement may be needed for ambiguous rules. As components depend on LLMs, they remain susceptible to adversarial prompt attacks, partially mitigated via prompt hardening and sanitization. The reliance on Top- filtering and a closed-world predicate set may fail to capture novel risks not encoded within .
The implementation will be released at https://github.com/yyiliu/QuadSentinel.
6. Conclusion
QuadSentinel establishes the first modular multi-agent guard framework for LLM systems, anchoring policy enforcement in formal logic through sequent-based obligations over machine-checkable predicates. Its four-agent design supports stateful, real-time safety with high accuracy, low false positive rates, and transparent rationales, outperforming single-agent baselines such as ShieldAgent. The approach enables practical, robust deployment to arbitrary agent ecosystems without intruding on agent architectures or requiring core modifications (Yang et al., 18 Dec 2025).