Assertain: Automated Security Assertion Generation Using Large Language Models

Published 2 Apr 2026 in cs.CR | (2604.01583v1)

Abstract: The increasing complexity of modern system-on-chip designs amplifies hardware security risks and makes manual security property specification a major bottleneck in formal property verification. This paper presents Assertain, an automated framework that integrates RTL design analysis, Common Weakness Enumeration (CWE) mapping, and threat model intelligence to automatically generate security properties and executable SystemVerilog Assertions. Assertain leverages LLMs with a self-reflection refinement mechanism to ensure both syntactic correctness and semantic consistency. Evaluated on 11 representative hardware designs, Assertain outperforms GPT-5 by 61.22%, 59.49%, and 67.92% in correct assertion generation, unique CWE coverage, and architectural flaw detection, respectively. These results demonstrate that Assertain significantly expands vulnerability coverage, improves assertion quality, and reduces manual effort in hardware security verification.

Abstract PDF Upgrade to Chat

Authors (6)

Summary

The paper proposes an automated pipeline using LLMs that integrates threat-model awareness and CWE mapping to generate executable security assertions.
It employs a multi-phase approach with knowledge mapping, context-aware LLM prompting, and self-reflection refinement to ensure syntactic and semantic quality.
Experimental results show up to 61% more correct assertions and nearly 68% improvement in architectural flaw detection compared to baseline methods.

Assertain: Automated Security Assertion Generation Using LLMs

Background and Motivation

The continued scaling of System-on-Chip (SoC) complexity exposes hardware designs to multi-layered security risk, driving the need for comprehensive security property and SystemVerilog Assertion (SVA) coverage as part of formal property verification (FPV). Conventional assertion-based verification (ABV) is bottlenecked by the manual, expert-driven process of formulating security-centric assertions for RTL, which is labor intensive, prone to context bias, and unable to keep pace with evolving threat models. LLMs have demonstrated semantic understanding and code synthesis in hardware design and security verification, but prior work in LLM-assisted assertion generation suffers from semantic drift, lack of threat intelligence integration, and a high rate of hallucinated or non-compilable assertions.

Assertain addresses these deficiencies by integrating threat-model awareness, design-aware Common Weakness Enumeration (CWE) mapping, and refined context injection to produce executable and robust SVAs, leveraging LLMs with an iterative self-reflection refinement mechanism. The method targets maximal security vulnerability coverage, alignment with relevant attack surfaces, and reduction in manual expert intervention.

Methodology

Multi-Phase Architecture

Assertain's flow consists of three main phases:

1. Knowledge Mapping

RTL designs are classified into architectural categories, which are mapped to a curated selection of CWE identifiers relevant to hardware security weaknesses. Simultaneously, user-supplied threat vectors are mapped to a separate set of CWEs using a static threat-knowledge database. The intersection of these structural and threat-driven CWE sets forms the scope for assertion generation, ensuring that derived properties are strictly relevant to implementation and adversarial context.

2. Context-Aware LLM Generation

For every CWE in the scoped set, the system prompts an LLM with three layers of context: LLM persona as hardware security expert, formal CWE definitions/examples, and the verbatim RTL source (with module hierarchy context). The LLM is required to output for each CWE:

A natural language attack scenario.
A formal property in natural language.
The SystemVerilog assertion for that property (SVA).

The framework uses strict deterministic table outputs and multiple prompt iterations per target CWE to maximize breadth and mitigate output stochasticity. Prompt engineering enforces coverage of both intra-module and inter-module protocols, syntactic validity, behavioral perspective, use of only in-scope RTL signals, non-vacuity, correct clocking/reset logic, and standard SystemVerilog idioms.

Candidate SVAs are syntactically and semantically validated by cross-referencing signal names, macros, and structural relationships in the original RTL file. Assertions referencing hallucinated identifiers or violating property quality rules are filtered out. The final output consists of a compilable .sva module, suitable for direct integration with formal verification tools.

Experimental Results

Assertain was evaluated against a baseline GPT-5 LLM on eleven representative RTL modules, covering access control, cryptographic, processor, memory, and peripheral domains:

Assertion Quantity and Correctness: Assertain's correct assertion count is up to 61.22% higher than GPT-5, with absolute increases on large/complex subsystems. Syntactic and functional correctness is consistently high and uncorrelated with increased property volume, indicating the efficacy of multi-stage context injection and refinement.
CWE Coverage: Unique CWE coverage is improved by 59.49%. Assertain surfaces vulnerabilities spanning broader classes not detected by the baseline, indicating superior mapping from threat surface to assertion space.
Architectural Flaw Detection: Assertain achieves 67.92% higher coverage of detectable architectural flaws, including those arising from privilege escalation gaps, improper state transitions, lack of fault tolerance, and data leakage under debug/fault states.

The paper details specific example properties for modules such as DMI_JTAG (e.g., enforcing zeroed data on error), UART (blocking debug register read leakage), password verification (challenge re-use), and state machine fault recovery.

Implications and Analysis

Assertain demonstrates that integrating structural, threat, and vulnerability ontologies in LLM-in-the-loop workflows markedly improves both the syntactic and semantic quality of automatically generated SVAs. The approach bridges prior gaps in existing LLM-based assertion generators by enforcing tight contextual alignment and incorporating CWE-based verifiability constraints. Robust architectural flaw detection arises not merely from functional correctness, but from the capacity to generate security assertions that act as negative and gap-finding tests, surfacing missing best-practice protections rather than merely functional mismatches.

The methodology has several implications:

For Security Verification Flows: Assertain offers a scalable methodology for augmenting human-driven security coverage and supports rapid exploration of the security assertability space in the context of new threat models or evolving designs.
For LLM-Augmented EDA: The approach exposes the need for persistent domain ontology integration and multi-stage refinement in LLM-EDA workflows; monolithic LLM prompting alone is insufficient for FPV assertion quality at deployment scale.
For Theoretical Development: The intersectional ontology mapping and role of iterative self-reflection prompt new lines of work in alignment, knowledge retrieval-augmented generation, and compositional verification artifacts.

Prospects and Future Directions

Possible extensions include dynamic expansion of the CWE knowledge base, plug-in support for retrieval-augmented specification and documentation mining, and enlarged design space evaluation to full SoCs and undocumented RTL. An open problem remains the automated translation of failed assertion root causes into suggested design hardening operations, and the further reduction of manual prompt engineering via adaptive context slicing and fine-tuned LLMs.

Conclusion

Assertain provides a technically rigorous, tightly contextualized and threat-model aware pipeline for end-to-end automated security property generation at RTL, moving beyond the limitations of prior LLM-based property generators. Its robust LLM-orchestration and refinement workflow achieves strong quantitative improvements in assertion quality, CWE and flaw coverage, presenting a pathway to scalable, context-driven, and robust security assurance in hardware design verification (2604.01583).

Markdown Report Issue