- The paper introduces a Skill Description Language (SDL) that encodes both structured endpoints and natural language policies for LLM-driven agent skills.
- The Constraint-Guided Representation Synthesis (CGRS) method iteratively refines candidate translations with LLM proposals and validator feedback to ensure high semantic alignment.
- The Datalog-based detection approach identifies over 50% critical semantic risks, including 17 zero-day vulnerabilities, achieving high recall and F1 scores.
Auditing Agent Skills via Constraint-Guided Representation Synthesis
Problem Setting and Motivation
The paper presents Semia, a static auditor designed for the analysis of "agent skills," which are configuration artifacts enabling LLM-driven agents to undertake high-impact, autonomous actions such as signing blockchain transactions, executing shell commands, and automating communication tasks. These skills encapsulate capabilities using a dual structure: a machine-readable, structured interface specifying endpoints and permissions, and a natural language segment specifying operational policy. The security boundary is often encoded only in the unstructured, natural-language policy, which is evaluated by the LLM at inference time. This design creates a significant attack surface: adversaries capable of influencing the LLM's context can subvert policy enforcement via indirect prompt injection (IPI), with no code-level vulnerability present.
Conventional static analysis methods fail in this domain. Signature-based and syntactic analyzers lack the expressivity to parse the semantics of natural-language policies. LLM-as-a-judge approaches exhibit high non-determinism across runs and cannot give reproducible, formal guarantees. Run-time detection methods (e.g., prompt injection monitors) only mitigate exploits after deployment, failing at proactive audit.
Contributions
- Skill Description Language (SDL): The authors introduce a compact, relational fact schema that encodes both structured and unstructured security-relevant agent skill content for static analysis.
- Constraint-Guided Representation Synthesis (CGRS): The translation of hybrid skills into SDL is framed as a bounded, structured synthesis loop:
- An LLM proposes candidate translations.
- Non-LLM validators enforce syntactic and semantic well-formedness.
- Semantic distance is measured by round-tripping (verbalizing) the candidate and comparing to the original.
- Hints are generated for failed candidates (reference or flow errors), improving convergence and faithfulness.
- Datalog-Based Detection: Once a skill is lifted into SDL, security properties—including diverse indirect injection paths, secret leakage, confused deputy risks, and gate absence—are checked via deterministic Datalog queries with complete witness paths.
- Large-Scale Evaluation and Zero-Days: Evaluation on 13,728 real skills identifies that over 50% contain at least one critical semantic risk. On a 541-skill human-labeled sample, Semia achieves 97.7% recall and 90.6% F1, outperforming commercial and marketplace baselines. Seventeen novel zero-day vulnerabilities are found and responsibly disclosed.
Technical Approach
SDL Fact Schema
SDL captures:
- Skill and action hierarchy (
skill, action, call)
- Data flow edges, parameter bindings
- Annotations (trigger source, trustedness, sensitivity, authorization gates, sanitizers)
- Explicit documentation claims and provenance of secrets
- Indicators for obfuscation, binary payloads, and conditionally reachable code
SDL's expressivity is tailored to the agent-skills domain, with closed enumerations (effects, triggers, barriers, claims) extensible as new primitives emerge.
Constraint-Guided Lifting
Translation is achieved by a loop alternating between LLM-based proposal and automated, programmatic verification:
- Structural validators enforce referential integrity, connected dataflow, and consistent annotation references.
- Semantic alignment uses a round-trip metric: the candidate's verbalized text is compared (unit coverage-based metric) to the original prose. Only high-alignment candidates are accepted.
- Diagnosis/Repair: When a candidate is structurally invalid, targeted hints drive the LLM to address specific flaws, improving convergence.
This approach is necessary as single-shot LLM translation suffers from hallucinations, omissions, and structural ambiguity.
Reachability and Detection
Once lifted, analysis is deterministic and tractable:
- Datalog closure relations capture transitive data, taint, and privilege propagation.
- Predicates distinguish taint (integrity), secret (confidentiality), and control-flow reachability.
- Eleven detectors classify:
- Unguarded high-privilege sinks (e.g., no human-approval barrier on transactional calls)
- Taint flows (e.g., attacker-controlled data reaches shell execution)
- Secret overreach and exfiltration (e.g., local secrets routed to untrusted endpoints)
- Structural anomalies (obfuscated code, unresolved dependencies, dormant conditional payloads)
All findings include precise witness paths through the fact base, facilitating review.
Evaluation and Results
Semia is benchmarked against VirusTotal (industry signatures) and ClawScan (marketplace heuristics + LLM). On the labeled sample:
- Precision: 84.5%
- Recall: 97.7%
- F1: 90.6%
VirusTotal reports high precision (89%) but <14% recall, missing almost all semantic vulnerabilities. ClawScan improves recall but at significant precision cost (73%).
Ablation studies reveal:
- Direct LLM judgment alone is highly unstable and unreliable (F1: 70.7%).
- The introduction of SDL with Datalog reachability increases F1 by +16 points.
- Iterative refinement delivers a further +4 points, for a total +20 F1 over naive LLM baselines.
Critically, Semia identified 17 critical exploitable zero-day vulnerabilities, not previously recorded or reported, and not flagged by mainstream scanners.
Practical and Theoretical Implications
Practical:
- Semia provides a tractable and automatable pre-deployment assurance for proposed agent skills, directly addressing marketplace-scale supply chain risks.
- Outputted witness paths are interpretable by human auditors, mitigating review fatigue and enabling immediate validation or correction.
- The CGRS approach is robust to the surface variability of natural-language policy declarations.
Theoretical:
- The hybrid neuro-symbolic audit paradigm—where LLMs are used exclusively for intermediate representation lifting and not verdict rendering—achieves both expressivity and reproducibility.
- Constraining representation synthesis via structural and semantic feedback loops controls for LLM instability and reduces hallucination/omission errors.
- The Datalog formulation demonstrates that key semantic security queries in hybrid agent artifacts can be stated compactly and checked efficiently, as long as a faithful IR can be synthesized.
Limitations and Future Directions
- The primary failure modes are adversarially complex or ambiguous prose that escapes correct fact extraction, and skills whose policy is distributed across deeply nested third-party dependencies. Addressing cross-repo multi-artifact composition is an open research avenue.
- The method is currently domain-specific; adapting to other hybrid artifacts (e.g., plugin manifests, cloud workflow pipelines) requires schema engineering and detection rule population.
- While Semia detects the absence of gates and semantic contradictions, runtime guarantee or enforcement of natural-language gates remains outside its scope.
Potential future work includes compositional multi-artifact analysis, partially automated repair suggestion and transformation, and domain-generalization (e.g., to conversational interface plugins and workflow orchestrators).
Conclusion
Semia formalizes the static audit of LLM-agent skill artifacts as a precise, neuro-symbolic reachability analysis problem, closing the semantic enforcement gap present in current supply chains. The introduction of CGRS for representation synthesis, coupled with Datalog-based detection, achieves significant improvements in both recall and zero-day discovery rate. The demonstrated efficacy and efficiency of this approach indicates that domain-aware, structured intermediate representation lifting is likely to be foundational for proactive security assurance in the agentic automation era.
Reference: "Semia: Auditing Agent Skills via Constraint-Guided Representation Synthesis" (2605.00314)