Plan-Verify-Fill (PVF) Paradigm

Updated 25 January 2026

PVF is a three-phase computational paradigm that integrates plan synthesis, formal verification, and targeted correction across multiple domains.
It operates through iterative cycles where candidate solutions are planned, rigorously validated, and then repaired using counter-example guided refinement.
Empirical evaluations demonstrate PVF’s effectiveness with high recall and efficiency improvements in applications like embodied AI, diffusion language models, and formal verification.

Plan-Verify-Fill (PVF) is a three-phase computational paradigm for structured reasoning, plan synthesis, and quality assurance in domains ranging from formal verification to embodied AI, diffusion language modeling, and interactive planning. Each instance of PVF instantiates domain-adapted “Plan,” “Verify,” and “Fill” stages, which collectively translate intuitive goals into precise artifacts, ensure those artifacts meet rigorously specified criteria, and remediate any detected deficiencies via targeted reparative steps or symbolic feedback propagation.

1. Foundational Concepts and Logical Structure

PVF operationalizes a cyclical workflow comprising:

Plan: Generation of candidate solutions (e.g., system plans, action sequences, text outputs) based on domain knowledge, user inputs, or learned models.
Verify: Application of formal or heuristic validation protocols to detect logical inconsistencies, specification violations, or qualitative deficiencies.
Fill: Remediation phase whereby counter-examples, error annotations, or violated constraints drive targeted refinements—either to the plan itself, underlying domain representations, or verification properties.

In formal software verification, PVF is tightly coupled with Linear-time Temporal Logic (LTL) frameworks. Verification properties are expressed as closed LTL formulas (φ), using operators such as □ (always), ◇ (eventually), and ○ (next time), together with domain-specific rules and equivalences (Winikoff, 2019). In this context, PVF systematically extracts verification properties from high-level tenets, utilizing a refinement tree architecture that alternates between direct formalization, domain-knowledge rule expansion, goal-tree decomposition, and dynamic knowledge elicitation.

2. Algorithmic Instantiations in Planning and Verification

PVF’s architecture generalizes across diverse computational domains, with representative instantiations including:

LLM-based Plan Verification: PVF employs a Planner LLM to generate an initial action sequence, a Judge LLM to critique the plan (removing erroneous steps and flagging missing actions), and a deterministic Fill operation to update the plan. Iterative application leads to progressively cleaner, more coherent trajectories (Hariharan et al., 2 Sep 2025). Stopping criteria typically rely on error count stabilization or capped iteration depth.
Diffusion LLMs: PVF enables parallel skeleton-building and speculative decoding by first planning high-leverage semantic anchors, followed by robust verification protocols that ensure planning commits do not contradict high-confidence regions, and AR fallback filling where further planning yields diminishing returns (Li et al., 18 Jan 2026).
End-User Planning (VeriPlan): PVF organizes LLM-generated plans, rule translation/constraint formalization (Fill), and model checking for verification. User-tuned flexibility sliders modulate the "hardness" of constraints, permitting tradeoffs between strictness and feasibility. Violations discovered during verification prompt reparative LLM replanning, closing the loop (Lee et al., 25 Feb 2025).

Table 1. PVF Algorithmic Elements Across Domains

Domain	Plan Component	Verify Component	Fill Component
Formal Verification	Goal/tenet extraction	Model checking (LTL)	Counter-example guided repair
LLM Embodied Planning	Action sequence LLM	LLM critique (REMOVE/MISSING)	Deterministic plan revision
Diffusion LLMs	Planning token commit	Filtered verification (context)	AR fallback & parallel fill
End-User Planning (VeriPlan)	LLM plan synthesis	Model checking (PRISM/Stormpy)	Rule translator, flexibility sliders

3. Mathematical Definitions, Properties, and Verification Criteria

In each PVF instantiation, the mathematical basis of the verification step is tailored to the domain:

LTL-Based Properties: Verification properties in the formal approach are expressed as φ ∈ LTL, e.g., φ(X) ≡ □[time(X) → (eating(X) ∨ ○ remind(X))] for safety reminders (Winikoff, 2019).
Recall and Precision: In LLM-based PVF, error detection efficacy is quantified via $\text{Recall} = \frac{\mathrm{TP}}{\mathrm{TP} + \mathrm{FN}}$ and $\text{Precision} = \frac{\mathrm{TP}}{\mathrm{TP} + \mathrm{FP}}$ (Hariharan et al., 2 Sep 2025).
Quantitative Stopping Rules: Structured PVF approaches such as in diffusion LLMs employ a global confidence measure $V(\mathbf{y})$ , halting planning when $\Delta V < \epsilon$ (structural saturation) (Li et al., 18 Jan 2026).
Kripke Structures in Model Checking: VeriPlan represents plans as finite-state Kripke structures $M = (S, S_0, A, T, L)$ , verifies them against user-defined LTL constraints, and propagates feedback iteratively (Lee et al., 25 Feb 2025).

4. Empirical Evaluation and Performance Outcomes

PVF has demonstrated quantifiable improvements across settings:

Embodied Task Verification: Iterative LLM-based PVF achieved up to 90% recall and 100% precision, with convergence in at most three iterations for 96.5% of cases on annotated TEACh dataset trajectories (Hariharan et al., 2 Sep 2025). This performance surpasses static rule-based methods across two metrics (see tables in data).
Diffusion Text Generation: PVF reduced the Number of Function Evaluations (NFE) by 40–65% over leading baselines, maintaining or slightly improving task accuracy (e.g., GSM8K: NFE from 512 to 31.3, accuracy 79.62% vs 79.46%) (Li et al., 18 Jan 2026).
End-User Planning Usability: In a user study (n=12), full PVF (including translator, sliders, checker) significantly outperformed baseline and ablated conditions in perceived plan performance, usefulness, and satisfaction (statistical comparisons reported; e.g., C1 ≫ C3, p=.0011) (Lee et al., 25 Feb 2025).

5. Knowledge Elicitation, Feedback, and Correction Dynamics

The “Fill” phase in PVF systematically “repairs” detected violations:

Counter-Example Handling: In verification-centric workflows, a counter-example trace may trigger plan/design faults to update models, prompt missing assumption elicitation, or cause relaxation of overly strong properties (e.g., probabilistic thresholds, time bounds) (Winikoff, 2019).
Iterative Refinement: PVF in LLM planning and end-user systems utilizes explicit feedback (e.g., human-checked violations, REMOVE/MISSING annotations) concatenated to prompts for corrective LLM regeneration, leveraging the human–LLM–symbolic triad for improved solution quality (Hariharan et al., 2 Sep 2025, Lee et al., 25 Feb 2025).
Action Recovery Patterns: PVF frameworks that preserve correction sequences, rather than collapsing them into an ideal trajectory, support the retention of human error-recovery behavior, enhancing robustness and teaching resilience (Hariharan et al., 2 Sep 2025).

6. Domain-Specific Adaptations, Limitations, and Roadmap

Formal Verification: PVF enables systematic extraction and formalization of properties but scaling to industrial-size systems and probabilistic requirements remains open (Winikoff, 2019).
Embodied AI: Current PVF implementations are limited to a slice of household tasks; generalization to other domains requires further study. Computational latency from large model calls and annotation subjectivity are noted constraints (Hariharan et al., 2 Sep 2025).
Diffusion Decoding: Node selection schemas and hyperparameter sensitivity require task-specific tuning; the assumption of efficient batch parallelism may not universally hold (Li et al., 18 Jan 2026).
Human-Interactive Planning: Component-wise ablation studies reveal distinct, significant contributions of each phase. Restatements of natural-language rules as formal constraints and adaptive constraint “softening” yield improved quality and satisfaction, without negatively impacting ease of use (Lee et al., 25 Feb 2025).

A plausible implication is that future PVF paradigms will progressively integrate joint learning of planning anchors, multi-modal sequence generation, and increasingly sophisticated planning score functions, while incorporating explicit uncertainty modeling and resource-efficient verification protocols.

PVF constitutes a unifying meta-framework rather than a standalone algorithm. It subsumes and extends prior cyclic verification-planning schemes, offering model-agnostic, rigorously interlocked feedback cycles applicable across symbolic, neural, and interactive computation. Future work may focus on tool support for industrial application, scaling to large models and corpora, and integrating visual/simulator feedback, as well as continued formalization of property derivation and user-grounded verification property specification (Winikoff, 2019, Lee et al., 25 Feb 2025, Hariharan et al., 2 Sep 2025, Li et al., 18 Jan 2026).