Papers
Topics
Authors
Recent
Search
2000 character limit reached

QuadSentinel: Modular Guard for LLM Systems

Updated 25 December 2025
  • QuadSentinel is a modular multi-agent guard architecture that formalizes safety policies using Boolean predicates and sequents.
  • It employs four specialized agents—State Tracker, Threat Watcher, Policy Verifier, and Referee—to monitor and control inter-agent actions in real time.
  • Empirical evaluations on benchmarks demonstrate improved accuracy, precision, recall, and runtime efficiency compared to single-agent systems.

QuadSentinel is a modular multi-agent guard architecture for real-time enforcement of formal safety policies in LLM-based (LLM-based) multi-agent systems. It converts ambiguous, context-dependent natural language safety requirements into machine-checkable sequents over Boolean predicates, providing accurate and efficient online control over agent actions and inter-agent messages. The system leverages a four-agent collaboration—State Tracker, Threat Watcher, Policy Verifier, and Referee—each serving a distinct role in policy enforcement. QuadSentinel achieves improved guardrail accuracy, rule recall, and a reduced false positive rate (FPR) over prior single-agent baselines on benchmarks such as ST-WebAgentBench and AgentHarm, all without modifying core agent logic (Yang et al., 18 Dec 2025).

1. Sequent-Based Policy Formalism

QuadSentinel operationalizes safety as a set of formal sequents built over a finite collection of Boolean predicates P={p1,p2,}\mathcal{P} = \{ p_1, p_2, \dots \} that index observable environment state and agent interactions. At each timestep tt, the system maintains ΓtP\Gamma_t \subseteq \mathcal{P} as the set of currently true predicates, which are used to evaluate policy compliance.

Natural-language safety policies (e.g., "Do not publish sensitive info") are compiled offline into propositional logic rules, such as

ϕ¬(sensitive_infopublish_content)\phi \coloneqq \neg(sensitive\_info \wedge publish\_content)

These are restated as safety obligations ψϕ\psi_\phi, with sequents Γtψϕ\Gamma_t \vdash \psi_\phi defining at runtime whether the current state and interaction satisfy the policy. The implication

(pΓtp)ψϕ\left( \bigwedge_{p \in \Gamma_t} p \right) \Rightarrow \psi_\phi

is checked for each active rule; failure triggers an enforcement block. For example, if both sensitive_infosensitive\_info and publish_contentpublish\_content are true in Γt\Gamma_t, the data leakage rule is violated and blocked.

2. Four-Agent Guard Structure

QuadSentinel's guard intercepts all inter-agent messages or actions mtm_t and maintains complete control state Θt=(P,R,Γt,τt)\Theta_t = (\mathcal{P}, \mathcal{R}, \Gamma_t, \tau_t). The pipeline implements the following specialized subagents:

2.1 State Tracker

The State Tracker maintains Γt\Gamma_t by updating only a candidate set of predicates Ct=StMtC_t = S_t \cup M_t, where StS_t comprises the top-kk predicates retrieved via semantic similarity to mtm_t, and MtM_t are predicates required by static dependencies. Updates are performed by a high-precision LLM, with all non-candidate predicates retaining prior value (frame assumption).

2.2 Threat Watcher

Threat Watcher assigns per-agent risk scores τt(u)[0,1]\tau_t(u) \in [0, 1] via exponential decay updates: τt+1(u)=ατt(u)+(1α)risk(mt,Γt)\tau_{t+1}(u) = \alpha \tau_t(u) + (1-\alpha) \, \mathrm{risk}(m_t, \Gamma_t) Increasing risk dynamically expands the State Tracker's evaluation budget and raises denial thresholds in the Referee module.

2.3 Policy Verifier

The Policy Verifier selects currently active rules Rtapp\mathcal{R}_t^{\rm app} and evaluates the relevant sequents, returning any violated rules Rtviol\mathcal{R}_t^{\rm viol} along with minimal counterexamples. Upon uncertainty, it triggers additional state expansion.

2.4 Referee

The Referee adjudicates enforcement decisions based on mtm_t, Γt\Gamma_t, τt\tau_t, and Rtviol\mathcal{R}_t^{\rm viol}, using risk-adaptive thresholds. Hierarchical execution begins with a lightweight LLM pass (e.g., GPT-4o-mini), escalating to a larger LLM for denials or ambiguous cases. The Referee emits allow/deny signals and human-readable rationales.

Guard Pipeline (Algorithm 1)

1
2
3
4
5
6
7
8
def guard_pipeline(m_t, Theta_prev):
    C_t = TopK(m_t, P) | MustCheck(m_t)
    Gamma_t = EvalPredicates(C_t, Gamma_prev)
    tau_t = UpdateThreat(m_t, tau_prev, Gamma_t)
    R_app = select_active_rules(m_t)
    R_viol = {r in R_app | not implies(Gamma_t, psi_r)}
    d_t, T_t = Referee(m_t, tau_t, Gamma_t, R_viol)
    return d_t, (P, R, Gamma_t, tau_t), T_t

3. Policy Compilation and Online Enforcement

In the offline phase, powerful LLMs (GPT-4o) parse free-form policy text into predicate sets P\mathcal{P} and propositional rules R\mathcal{R}, with human-in-the-loop review for correctness. An embedding index over predicate schemas (supporting Top-kk retrieval) and a static dependency graph are built to facilitate efficient runtime selection.

At runtime, a Top-kk predicate updater leverages Approximate Nearest Neighbor search (HNSW) to select relevant predicates in O~(klogm)\tilde O(k \log m) time, reducing per-step costs to those of an LLM pass over O(n)O(n) tokens, given n=len(mt)n = \mathrm{len}(m_t) and k=O(n)k = O(n): Tefficient=Tindex(m,k)+tLLM(n+k)T_{\rm efficient} = T_{\rm index}(m, k) + t_{\rm LLM}(n + k)

Referee conflict resolution is governed by severity scores s(r)s(r); any rule violation with s(r)>τt(u)×(base-threshold)s(r) > \tau_t(u) \times \text{(base-threshold)} triggers a denial, based on the agent's current risk. Hierarchical LLM escalation reduces false positives while maintaining recall.

4. Evaluation and Empirical Results

QuadSentinel is evaluated on ST-WebAgentBench (256 tasks, 646 policies) and AgentHarm (176 tasks, 11 harm categories), using both AWM and Magentic-One base agents. Baselines for comparison include Prompt-based Guard, ShieldAgent, and GuardAgent. Models employed are GPT-4o / GPT-4o-mini, with ablation studies using Qwen3-235B.

4.1 Main Performance

Guardrail ST-WebAgentBench: Acc Prec. Rec. FPR AgentHarm: Acc Prec. Rec. FPR
ShieldAgent 91.1 81.6 74.1 4.4 86.9 95.2 77.7 3.9
GuardAgent 84.0 91.9 74.6 6.6 78.4 93.7 60.9 4.1
Prompt-based 77.9 52.7 84.2 24.2 88.6 94.2 82.4 5.2
QuadSentinel (Ours) 93.6 88.9 84.2 3.4 91.5 97.4 85.2 2.3

QuadSentinel yields superior accuracy, precision, recall, and lower FPR compared to all baselines.

4.2 Runtime Efficiency

Variant Additional Token Token Ovhd Additional Time (s) Time Ovhd
Base Agent 10 699
QuadSentinel +3.7 M 0.95× +3 550 0.33×
ShieldAgent +7.3 M 1.86× +6 640 0.62×
GuardAgent +10.4 M 2.66× +13 302 1.24×

QuadSentinel minimizes both additional token and time overhead relative to prior approaches.

4.3 Ablation Analysis

Variant Acc (%) Prec (%) Rec (%) FPR (%)
Full System 91.5 97.4 85.2 2.3
Single Referee 83.0 74.6 100 34.1
No Threat Watcher 86.4 90.0 81.8 9.1
No Top-k Filter 88.9 92.0 85.2 7.4

All four agents are critical for optimal performance and minimal false positives.

Per-category breakdown (Appendix C) confirms that QuadSentinel maintains the best tradeoff between utility and safety across harm categories.

5. Deployment Model and Limitations

QuadSentinel is deployable as a plug-in, wrapping existing multi-agent LLM infrastructures via authenticated message channels. No modifications to agent internals or retraining are required; policies reside in a distinct, cryptographically signed “policy book” loaded at system start.

Quality of policy translation is crucial; human refinement may be needed for ambiguous rules. As components depend on LLMs, they remain susceptible to adversarial prompt attacks, partially mitigated via prompt hardening and sanitization. The reliance on Top-kk filtering and a closed-world predicate set may fail to capture novel risks not encoded within P\mathcal{P}.

The implementation will be released at https://github.com/yyiliu/QuadSentinel.

6. Conclusion

QuadSentinel establishes the first modular multi-agent guard framework for LLM systems, anchoring policy enforcement in formal logic through sequent-based obligations over machine-checkable predicates. Its four-agent design supports stateful, real-time safety with high accuracy, low false positive rates, and transparent rationales, outperforming single-agent baselines such as ShieldAgent. The approach enables practical, robust deployment to arbitrary agent ecosystems without intruding on agent architectures or requiring core modifications (Yang et al., 18 Dec 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to QuadSentinel.