AI-Guided Assertion Synthesis

Updated 15 February 2026

AI-guided assertion synthesis is a technique that automates the generation of precise, machine-checkable assertions from specifications, code, and behavioral traces.
It integrates static AST analysis, LLM-driven CoT prompting, and retrieval-augmented methods to improve semantic depth and functional coverage in both hardware and software verification.
This approach leads to higher validation metrics, reduced manual errors, and scalable, efficient assertion-based verification pipelines.

AI-guided assertion synthesis refers to the use of Machine Learning (ML), LLMs, and neural-symbolic hybrid methods to automate or enhance the generation, refinement, and mining of programmatic and hardware assertions from specifications, code, or behavioral traces. Assertion synthesis is central in Assertion-Based Verification (ABV) for both hardware and software—spanning SystemVerilog Assertions (SVAs) for RTL, postcondition/precondition inference for code, test oracle generation, and invariant discovery. The evolution from heuristic mining and human-in-the-loop templating to AST-guided LLM prompting, retrieval-augmented generation, and joint retriever-generator optimization represents a major technical advance, expanding both coverage and semantic depth in assertion generation.

1. Motivation and Problem Landscape

Assertion synthesis addresses the need to generate precise, machine-checkable statements that verify or monitor the correct functional behavior of hardware modules or software units. In digital design, assertions (e.g., SVA) are critical for detecting corner-case bugs, achieving functional coverage, and enabling formal and simulation-based flows. Manual assertion authoring is error-prone, inconsistent across engineers, and fails to scale with system complexity—especially for deep micro-architectural submodules where most logic bugs manifest (Lyu et al., 13 Nov 2025). In software, writing unit-test oracles and behavioral invariants likewise represents a bottleneck (Zhang et al., 22 Feb 2025, Polgreen et al., 2020).

Traditional assertion synthesis methods, reliant solely on parsing natural language specifications or code mining, have significant limitations:

Top-level specification parsing often neglects deep module-level logic, missing errors localized in submodules and slow to flag violations (e.g., requiring 10 cycles vs. 2 cycles for firing in the SHA3 padder case) (Lyu et al., 13 Nov 2025).
String-matching or IR-based retrieval approaches lack the semantic generalization to infer non-trivial assertion properties and do not adapt to the structure of the focal test or code under analysis (Zhang et al., 22 Feb 2025).
Pure neural synthesis lacks the formal guarantees to ensure generated assertions actually meet logical correctness requirements (Polgreen et al., 2020).

AI-guided assertion synthesis, integrating static analysis, LLMs, retrieval-augmented pipelines, and hybrid learning, is designed to overcome these limitations—enhancing coverage, reducing manual labor, and increasing semantic robustness.

2. Technical Methodologies Across Domains

Several principal methodologies define the current state of AI-guided assertion synthesis.

2.1 Module-Level SVA Mining with AST-LLM Workflows

AssertMiner (Lyu et al., 13 Nov 2025) exemplifies AST-guided assertion mining, particularly at the module and submodule level. The framework uses static structural extraction of the RTL via abstract syntax tree (AST) analysis to derive a:

Module Call Graph (MCG): nodes are modules; directed edges denote instantiation relationships.
I/O Table: tabulates ports, their directions, and parent-child signal connections.
Dataflow Graph: edges represent assignment-driven dependencies among signals.
Signal Chains: backward traversals on dataflow graphs to connect outputs to source signals driven by module inputs or higher-level ports.

An LLM pipeline then proceeds:

AST-based structural extraction as above.
Prompt-driven extraction of concise, functional module specifications from I/O and signal chains.
Decomposition of each module specification into atomic, verifiable features.
LLM-based SVA synthesis, plugging atomic propositions into a unified SVA template (e.g., antecedent |-> consequent form). Prompts include explicit SystemVerilog templates, and low-temperature decoding ensures stability and consistency.

This separation between specification and buggy implementation mitigates RTL-induced hallucinations. Temporal logic is used to express timing relationships, e.g., "∀ t: pad_en(t) ⇒ ◇ (padded_out(t+1) == pad(data_in(t)))".

2.2 Retrieval-Augmented Deep Assertion Generation

RetriGen (Zhang et al., 22 Feb 2025) and AG-RAG (Zhang et al., 15 Feb 2025) employ retrieval-augmented generation in software unit-test assertion synthesis. The pipeline includes:

Construction of a hybrid retriever: combines token-based (Jaccard) and semantic embedding-based similarity (cosine) to pull the most relevant historical test-assert pairs from an external codebase.
The input focal test is concatenated with the retrieved assertion and conditioned into a pre-trained sequence-to-sequence Transformer model (e.g., CodeT5).
AG-RAG further tightens performance by jointly training the retriever and the generator, so retrieval probabilities and generation losses are optimized in a single differentiable objective. This enables the retriever to select TAPs that maximize assertion accuracy in context, while the generator learns to constrain edits/use retrieved scaffolds for higher-quality outputs.

Retrieval-augmentation directly improves both exact-match accuracy (57.66% in RetriGen, +21–27pp in AG-RAG vs. EditAS), as well as structure-aware CodeBLEU, over baselines that treat retrieval and generation in isolation (Zhang et al., 22 Feb 2025, Zhang et al., 15 Feb 2025).

2.3 LLM-Driven SVA Generation from Multimodal Specifications

AssertLLM (Yan et al., 2024, Fang et al., 2024), AssertCoder (Tian et al., 14 Jul 2025), and LAAG-RV (Maddala et al., 2024) generalize SVA synthesis to ingest entire multimodal design specifications, including text, diagrams, tables, and waveforms:

Structured extraction of per-signal semantic templates (name, definition, I/O type, description, interconnections).
Waveform or diagram analysis, processed by dedicated modules or LLMs, to instantiate timing constraints and behavioral templates.
Chain-of-Thought (CoT) or multi-step prompting guides LLMs through decomposition, pattern selection (implication, stability, liveness), temporal binding, and SVA synthesis.
Iterative validation and refinement: Generated SVAs are validated via simulation or formal tools (e.g., JasperGold), with failed assertions injected back into the LLM loop after incorporating failure logs or testbench feedback (Maddala et al., 2024, Mali et al., 2024).

Spec2Assertion (Wu et al., 12 May 2025) further advances this by progressive regularization—using phase-wise prompt engineering to regularize extracted causal sentences, strip redundancy, and ensure syntactic and semantic correctness (92% syntax-correct SVAs vs. 68% for AssertLLM, 2× importance scores). This is achieved entirely at the prompt level, without model fine-tuning.

2.4 Cross-Layer Bridging and Knowledge Graphs

AssertGen (Lyu et al., 28 Sep 2025) and AssertionForge (Bai et al., 24 Mar 2025) address the alignment between abstract specification objectives and RTL signal hierarchies:

AssertGen extracts verification objectives from spec via CoT prompting, then traverses the RTL hierarchy to bridge abstract signals into specific module paths, constructing a “signal chain” linking spec-level behaviors to concrete netlists for SVA synthesis.
AssertionForge builds a unified hardware-specific knowledge graph by merging entities and relations parsed from both spec and RTL. This multi-resolution context—coarse (design summaries), mid (signal-specific retrieval), fine (graph walks)—is pruned and packaged for LLM prompts, increasing coverage and discovery of complex assertion contexts.

3. Core Evaluation Metrics and Empirical Results

Evaluation of AI-guided assertion synthesis leverages a suite of formal and empirical metrics.

Hardware Assertion Synthesis

Syntax Correctness (fraction of SVAs accepted by formal tools or parsers).
FPV Pass Rate, Non-Vacuous Rate (NVR), and coverage metrics: Branch Coverage (BFC), Statement Coverage (SFC), Toggle Coverage (TFC), as computed by tools such as Cadence JasperGold (Lyu et al., 13 Nov 2025).
Cone of Influence (COI): Proportion of design logic constrained by assertions.
Functional and Mutation Detection Coverage: Number and fraction of single-point injected bugs detected (mutation testing), with up to 10–25% additional mutants detected over top-level only assertion sets (Lyu et al., 13 Nov 2025, Tian et al., 14 Jul 2025).
Average importance score (signal dependency graph depth) (Wu et al., 12 May 2025).

Tool/Method	SyntaxCorrect	NVR (%)	BFC	SFC	TFC
AssertMiner (I²C)	25/25	100	82.8	83.1	79.8
AssertMiner+Spec2A	-	>97	+2.7	+5.1	+4.2
AssertLLM	~100	90	97	-	-
AssertCoder	97.8	88.0*	-	-	-

*Percentage of SVAs validated by FPV model checking.

Software Assertion Synthesis

Exact-match assertion accuracy, BLEU (n-gram), and CodeBLEU (AST/dataflow-aware) scores (Zhang et al., 22 Feb 2025).
Unique assertions generated (i.e., not present in any baseline).
Mutation score (fraction of mutants detected by assertions) (Terragni et al., 2021).

Empirical results consistently demonstrate that hybrid, AI-guided methods outperform baselines: for instance, RetriGen improves over EditAS by +19.88% (accuracy), +2.79 CodeBLEU on “old” datasets, and AG-RAG achieves up to 3.45–9.2× more unique correct assertions (Zhang et al., 22 Feb 2025, Zhang et al., 15 Feb 2025).

4. Insights, Limitations, and Research Challenges

Key insights and limitations emerge across all AI-guided assertion synthesis methodologies:

Static AST/dataflow/knowledge graph guidance stabilizes LLM synthesis, yields higher functional coverage, and is critical to mining deep, module-level assertions not inferable from top-level specifications (Lyu et al., 13 Nov 2025, Bai et al., 24 Mar 2025).
Retrieval-augmentation and joint retriever-generator optimization unlock significant gains in software assertion accuracy and the discoverability of assertion patterns otherwise unreachable by token-only (lexical) or isolated generation schemes (Zhang et al., 15 Feb 2025, Zhang et al., 22 Feb 2025).
Progressive prompt-based regularization, CoT, and phase decomposition filter non-causal, duplicate, and syntax-incorrect SVAs prior to final assertion emission (Wu et al., 12 May 2025).
Closed-loop LLM-simulator or testbench pipelines (as in LAAG-RV and ChIRAAG) iteratively close gaps by refining assertion drafts in response to actual hardware or simulation errors, reducing the number of error-prone drafts and minimizing the manual debugging cycle (Maddala et al., 2024, Mali et al., 2024).
However, limitations persist:
- Non-trivial fractions of generated assertions (17–54% in AssertMiner) initially fail FPV due to hallucination or over-generalization; corrective refinement is necessary (Lyu et al., 13 Nov 2025).
- LLMs can miss deep corner cases or produce spurious assertions, especially for complex FSM or data-path modules with sparse or ambiguous specification (Lyu et al., 13 Nov 2025, Wu et al., 12 May 2025).
- Scalability bottlenecks are acute: mutation testing and signal-bridging in large SoCs remain time-intensive (>120 hours per design in some cases) (Lyu et al., 13 Nov 2025, Lyu et al., 28 Sep 2025).
- Effectiveness in designs with scarce or poorly-structured input specifications is limited.

5. Future Directions

Research avenues highlighted in recent work include:

Scalable cross-hierarchy and multi-module analysis via partitioned or parallel AST processing (Lyu et al., 13 Nov 2025).
Automated semantic filtering of assertions via simulation trace feedback or counterexample-guided refinement to remove spurious, provably wrong SVAs (akin to CEGIS-style neural synthesis (Polgreen et al., 2020)).
Mixed static-dynamic assertion mining: integrating VCD trace or log mining with static context extraction for richer assertion coverage (Lyu et al., 13 Nov 2025, Fang et al., 2024, Tian et al., 14 Jul 2025).
Parameter-efficient LLM fine-tuning (LoRA, adapters) for on-device or real-time assertion generation (Zhang et al., 22 Feb 2025).
Multi-lingual and cross-domain assertion synthesis leveraging retrieval-augmented learning for both hardware and software assertion corpora (Zhang et al., 15 Feb 2025).
Dynamic assertion ranking and coverage-driven prioritization to maximize assertion impact relative to verification closure (Pulavarthi et al., 28 Feb 2025, Wu et al., 12 May 2025).
Extension of the knowledge graph context (AssertionForge) to bug localization, code generation, and security assertion synthesis (Bai et al., 24 Mar 2025, Lyu et al., 28 Sep 2025).

6. Impact and Comparative Summary

AI-guided assertion synthesis stands as a deployment-ready set of methodologies that have demonstrably enhanced both the coverage and efficiency of ABV flows for hardware and the robustness and semantic alignment of oracles in software testing. The integration of static, semantic, and learning-based techniques enables significant improvements over conventional human-authored, IR-only, or pure generation-based assertion flows. Yield gains include enhanced mutant detection (up to +25%), higher semantic coverage (BFC/SFC/TFC improvements of +2–5 points), functional correctness increases (+8.4% for AssertCoder over AssertLLM), and significant reductions in manual iteration cycles (LAAG-RV, ChIRAAG).

A convergence is apparent: robust assertion synthesis in both domains depends on the seamless integration of static semantic structure (AST/KG/guided context), retrieval of aligned patterns or context, and closed-loop or progressive machine-driven regularization—anchored by, but not reducible to, generic LLM text generation.

Principal references: (Lyu et al., 13 Nov 2025, Zhang et al., 22 Feb 2025, Zhang et al., 15 Feb 2025, Wu et al., 12 May 2025, Lyu et al., 28 Sep 2025, Bai et al., 24 Mar 2025, Yan et al., 2024, Tian et al., 14 Jul 2025, Maddala et al., 2024, Mali et al., 2024, Fang et al., 2024, Polgreen et al., 2020, Terragni et al., 2021).