Clover: A Neural-Symbolic Agentic Harness with Stochastic Tree-of-Thoughts for Verified RTL Repair

Published 19 Apr 2026 in cs.AR and cs.AI | (2604.17288v1)

Abstract: RTL program repair remains a critical bottleneck in hardware design and verification. Traditional automatic program repair (APR) methods rely on predefined templates and synthesis, limiting their bug coverage. LLMs and coding agents based on them offer flexibility but suffer from randomness and context corruption when handling long RTL code and waveforms. We present Clover, a neural-symbolic agentic harness that orchestrates RTL repair as a structured search over code manipulations to explore a validated solution for the bug. Recognizing that different repair operations favor distinct strategies, Clover dynamically dispatches tasks to specialized LLM agents or symbolic solvers. At its core, Clover introduces stochastic tree-of-thoughts, a test-time scaling mechanism that manages the main agent's context as a search tree, balancing exploration and exploitation for reliable outcomes. An RTL-specific toolbox further empowers agents to interact with the debugging environment. Evaluated on the RTL-repair benchmark, Clover fixes 96.8% of bugs within a fixed time limit, covering 94% and 63% more bugs than both pure traditional and LLM-based baselines, respectively, while achieving an average pass@1 rate of 87.5%, demonstrating high reliability and effectiveness.

Abstract PDF Upgrade to Chat

Authors (9)

Summary

The paper introduces a hybrid neural-symbolic framework that combines LLM abstract reasoning with SMT-driven low-level RTL repair.
Clover employs a stochastic Tree-of-Thoughts search to modulate hypothesis exploration, achieving 96.8% bug repair coverage and 87.5% pass@1 performance.
Integrating domain-specific RTL tools and adaptive agent orchestration, Clover overcomes complex hardware debugging challenges with robust, verifiable repairs.

Clover: Neural-Symbolic Agentic Harness for Verified RTL Repair

Motivation and Context

Automatic program repair (APR) for register-transfer-level (RTL) hardware continues to pose significant challenges due to the diversity of fault types and the structural complexity inherent in hardware description languages. Traditional symbolic APR methods, such as RTL-Repair, offer rigor and precision but are limited by the finite scope of their predefined repair templates. Conversely, LLMs and LLM-driven coding agents deliver flexibility across high-level abstractions but are hindered by randomness, context window limitations, and susceptibility to context corruption when faced with dense RTL code and waveform data. The observed spectrum of program repair operations reveals a need for a hybrid approach: the high-level reasoning capabilities of LLMs are optimal for intent-centric repairs, while the low-level manipulation proficiency of symbolic solvers is indispensable for implementation-centric fixes.

Figure 1: Spectrum of repair operations delineating abstraction levels and matched APR solution modalities.

Clover Framework Architecture

Clover addresses the multi-step heterogeneity in RTL APR through a neural-symbolic architecture, integrating LLM agents and symbolic solvers. The workflow encapsulates a main agent, which dynamically delegates subtasks to specialized sub-agents (context and lint-fix agents) and selectively invokes SMT-based symbolic repair modules as needed. The main agent operates within a three-level nested loop—hypothesis generation, validation, and patch formation—emulating human debuggers’ iterative reasoning patterns. Sub-agents are provisioned with RTL-specific tools (VCD viewers, linting servers, and language servers), facilitating precise navigation and contextual summarization of large RTL codebases.

Figure 2: Clover framework overview, illustrating main agent coordination, sub-agent roles, RTL-specific tool mediation, and symbolic repair integration.

SMT-Based Symbolic Repair Extensions

Clover enhances symbolic repair by contextualizing template selection within the agent workflow. Rather than exhaustive backend template application, the main agent identifies the most relevant template for the detected fault, formulates the repair as an SMT instance, and synthesizes structured source-level actions. This approach broadens coverage to include cycle shifting faults—modeling temporal RTL bugs via cross-cycle signal manipulation. A free variable $\phi$ is introduced to control cycle delays, enabling the SMT solver to enforce assertion compliance through selective signal cycle shifting, which mitigates risks of combinational loops in naive symbolic approaches.

Figure 3: Cycle shifting mechanism and its corresponding SMT formulation for temporal RTL bug repair.

Stochastic Tree-of-Thoughts and Search Reliability

To counteract LLM stochasticity and improve exploration-exploitation balance during repair solution search, Clover implements a Stochastic Tree-of-Thoughts algorithm at inference time. The hypothesis tree organizes main agent-proposed dialogue states and code contexts, facilitating structured exploration across alternative solution paths. A heuristic function $f(c, h)$ , parameterized by testbench pass count, query/advisory frequency, error incidence, token consumption, and patching depth, guides stochastic expansion. The search procedure samples nodes based on the softmax of $f(c, h)$ , efficiently allocating computational resources and dynamically modulating search focus between promising and under-explored hypotheses.

Experimental Results and Analysis

Clover is evaluated on the RTL-repair benchmark suite against RTL-Repair, MEIC, and UVLLM baselines. It achieves a 96.8% bug repair coverage, outstripping traditional and LLM-driven baselines by 94% and 63%, respectively, and delivers an average pass@1 rate of 87.5%. The first 17 benchmarks (single-module/single-file) are trivially addressed; in more complex cases, Clover’s multi-agent orchestration and template-guided symbolic repair systematically overcome root causes obscured deep within module trees and unconventional structures. Notably, Clover reconciles algorithmic errors, combinational width faults, and handshaking mismatches through integrated analysis and trial-and-error reasoning, substantiated by waveform validation. Its repair reliability is further consolidated in synthetic benchmarks, demonstrating generalization to unsemantized, randomly wired logic cases.

Figure 4: Ablation study results on complex benchmarks, quantifying the contribution of Tree-of-Thoughts and SMT module integration.

Ablation studies indicate the necessity of both the Stochastic Tree-of-Thoughts mechanism and SMT-based symbolic repair for tackling multifaceted bugs. Disabling either results in content drops in pass@1 and increased time/token consumption, affirming the synergistic effect of neural-symbolic integration and structured agentic search.

Implications and Future Directions

Clover exemplifies the high efficacy of neural-symbolic harmonization in agentic harnesses for hardware program repair. Practically, Clover’s architecture enables robust, minimal-guidance APR in contemporary RTL workflows, handling both high-level and low-level repair operations with adaptive granularity. Theoretically, Clover’s framework advocates for scalable agent architectures equipped with domain-specific toolboxes and algorithmic test-time scaling. The stochastic tree search model lays groundwork for future explorations in resource-aware context engineering and hybrid orchestration strategies.

Potential developments include further refinement of heuristic functions via adaptive or learned weighting, incorporation of richer domain knowledge within agent toolboxes, extension to broader hardware description paradigms (e.g., Chisel, SystemC), and integration of automated specification derivation for program synthesis. The demonstrated reliability and coverage indicate that Clover’s neural-symbolic harness is poised for adoption in industrial-grade hardware design and verification cycles, contributing to a reduction in RTL debugging turnaround times and enhancing hardware reliability.

Conclusion

Clover establishes a robust neural-symbolic agentic harness for verified RTL repair, integrating specialized LLM workflows, SMT-based symbolic templates, and a stochastic tree-of-thoughts search strategy. It demonstrates superior bug coverage and reliability compared with state-of-the-art APR methods, confirming the viability of coordinated, contextually mediated agent submissions in complex hardware debugging and pointing toward scalable, verifiable AI-driven hardware design assistance.

Markdown Report Issue