PADME: Procedure Aware Dynamic Execution
- The paper introduces an approach that transforms unstructured procedural text into a directed acyclic graph encoding steps, dependencies, and decision points.
- It employs a two-phase teach–execute methodology, separating expert-guided structuring from dynamic execution to reduce error accumulation.
- Empirical results on ALFWorld and ScienceWorld demonstrate PADME's superior performance in maintaining procedural alignment and task success.
Procedure Aware DynaMic Execution (PADME) is a general agent framework for transforming free-form procedural text into executable, structured decision graphs that support robust, scalable, and context-sensitive long-horizon execution for intelligent agents. It addresses core challenges in autonomous execution—such as error accumulation and drift—by enforcing a clear separation between systematic procedure structuring and dynamic execution, providing an intermediate graph-based abstraction that encodes all task dependencies, decision points, and subroutines in an explicit, machine-readable form. PADME is empirically validated on diverse environments such as ALFWorld and ScienceWorld, where it outperforms existing baselines in maintaining correct procedural flow and final task completion.
1. Structured Graph-Based Procedure Representation
PADME autonomously parses procedural text (for example, instructions, SOPs, or scientific protocols) into a directed acyclic graph (DAG) , where:
- Each node represents a single procedural operation or step, categorized as Human Input, Information Processing, Information Extraction, Knowledge, or Decision.
- Nodes are associated with typed operator functions , specifying their expected input/output interface.
- Edges encode explicit logical, temporal, and dependency relationships (control flow, data dependencies, conditional transitions).
- Decision nodes are parameterized by conditional distributions , where denotes execution history, enabling data- and history-dependent branching.
- Algorithmic traversal of the procedure graph (Algorithm 1 in (Garg et al., 13 Oct 2025)) ensures satisfaction of all dependencies before node activation and allows modular subgraph reuse.
By converting unstructured text to an AI-ready, intermediate graph representation, PADME provides both human readability (for expert validation) and machine-readability, enabling complex dynamic execution scenarios.
2. Two-Phase Teach–Execute Methodology
PADME enforces a rigid separation between the construction and use of procedural knowledge:
- Teach Phase: Raw procedural text is processed by an LLM-based agent (Procedure Structuring Agent), which segments and abstracts text into local decision subgraphs. These are then merged into a global decision graph. Each node is annotated with compulsory metadata (e.g., names, descriptions, I/O schemas, dependencies, and semantic category). Non-actionable or ambiguous content is filtered during this pass, and nodes may be further enriched with bindings to APIs, code snippets, or external tool invocations.
- Execute Phase: At runtime, the execution agent traverses the validated decision graph in a topological order, instantiating step nodes and dynamically selecting branches at Decision nodes using algorithmic policies informed by current context: real-time inputs, agent observations, or environmental state.
This two-phase decomposition guarantees that expert-validated procedural structure is reusable across instances, and execution need not revisit the error-prone process of free-form reasoning for every episode.
3. Inductive Bias and Error Accumulation Mitigation
The graph representation fundamentally constrains execution and reduces drift:
- Constrained Action Space: By connecting each step only to allowed successors and enforcing explicit dependencies, PADME reduces possible state transitions from (for steps, where is the action vocabulary) to in worst-case graph traversal, preventing combinatorial explosion.
- Localized Parameterization: Each node's permissible input space is the output of its explicit parent nodes, minimizing the risk of cascading errors from invalid or misplaced free-form arguments.
- Structural Inductive Bias: Encoding procedural dependencies at graph construction time enforces step order and prerequisite satisfaction (for example, “bake” cannot occur before “mix”) directly in the graph, precluding errors inherent in purely sequential or string-based approaches.
- Error Mitigation: Empirical evidence from ALFWorld and ScienceWorld benchmarks demonstrates that decision graph-based execution maintains alignment with task structure for longer horizons, as reflected in metrics such as Prefix Match Length (PML) and Final Match (FM).
4. Empirical Evaluation Across Benchmarks
PADME's effectiveness is demonstrated in four benchmarks, including complex simulated domains:
- ALFWorld: PADME attains higher Prefix Match Length, Prefix Accuracy, Sequential Match, and Final Match scores than baselines such as Act-Only, ReAct, Chain-of-Thought, and SPRING. Notably, Final Match (FM) scores near 0.69 evidence robust completion of procedural objectives even in the presence of long-term dependencies.
- ScienceWorld: Despite higher task flexibility, PADME outperforms unstructured baselines in maintaining procedural alignment, as indicated by superior PML and PA scores.
- PADME generalizes well over a range of domains, underscoring the power of explicit structure for handling highly variable, multi-step tasks.
5. Application Domains and Scalability
PADME's architectural choices support scalable, multi-domain deployment:
- Domains of Application:
- Business workflows and SOP execution: Conversion of complex, regulated procedures (manufacturing, regulatory compliance) into reusable and executable graphs.
- Scientific experimentation: Enforcement of conditional protocols (e.g., error handling in laboratory automation).
- Recipe and culinary automation: Robust execution of complex, branching food preparation routines.
- Robotic and household task automation: Dynamic adaptation to live inputs and environment state during multi-step tasks.
- Scalability:
- One-time global structure construction during Teach phase supports repeated, parameterized execution in diverse settings with minimal per-instance engineering.
- Modular graph construction supports incremental expansion and straightforward accommodation of domain extensions by attaching or rerouting subgraphs/toolbindings.
6. Comparison to Prior Dynamic Execution Approaches
Relative to related systems:
- Manual vs. Automatic Structuring: Unlike AgentKit or SPRING, which depend on labor-intensive manual graph construction and domain decomposition, PADME automates the structuring process, facilitating rapid scale-up to new procedural domains.
- Structured Intermediate Abstraction: Methods such as ReAct, CoT, or Act-Only reason over free-form text sequences, which increases the risk of drift and error accumulation, especially for long-horizon or branching procedures.
- Dynamic Context-Sensitivity: PADME’s execution framework dynamically adapts at Decision nodes in response to run-time inputs and state, supporting context-sensitive branching that static sequential baselines cannot robustly deliver.
- Empirical Superiority: Across evaluated metrics and domains, PADME consistently maintains better alignment with intended procedural sequences and achieves higher final task success rates.
7. Relationship to Graph-Based Models in Other Modalities
Recent work on graph-based procedural understanding for instructional video (e.g., PKG-based pretraining in Paprika (Zhou et al., 2023)) integrates graph structure for improved downstream performance in tasks such as step recognition and forecasting. The structural bias provided by graph induction in video pretraining aligns with PADME's observation that imposing and exploiting explicit procedure graphs leads to more robust dynamic execution. Both approaches highlight the value of graph abstraction—either as pre-trained representation or as execution scaffold—for supporting the nuanced reasoning required in long-horizon, multi-modal procedures.
Conclusion
PADME establishes a paradigm in which free-form procedural knowledge is first automatically structured into decision graphs and only then subjected to adaptive, context-sensitive execution. This separation, together with explicit graph modeling of dependencies, decisions, and tool interactions, yields a robust, scalable foundation for generalizable autonomous procedure execution. Empirical validation confirms that graph-based intermediate abstractions substantially reduce error in long-horizon reasoning and enable resilient adaptation to dynamic inputs, positioning PADME as a central framework in the advancement of dynamic intelligent agents for procedure-intensive domains (Garg et al., 13 Oct 2025).