Mermaid-based Instruction Graphs

Updated 19 December 2025

The paper presents a novel method in LLM prompt engineering by encoding reasoning workflows as deterministic Mermaid-based acyclic graphs to boost accuracy and efficiency.
The methodology decomposes tasks into atomic steps, assigns conditional transitions via edge guards, and validates outputs to ensure error-free execution.
Empirical results demonstrate substantial performance-per-dollar gains, with structured Mermaid graphs outperforming traditional chain-of-thought approaches.

Mermaid-based instruction graphs constitute a structured, machine-readable framework for encoding and executing complex reasoning plans in LLM prompt engineering. Emergent in the context of bounded reasoning for autonomous inference and decisions, these entities formalize the construction and traversal of logic workflows as directed acyclic graphs (DAGs) annotated in Mermaid, a widely used textual visualization language. This representation enables stepwise, cost-efficient, and deterministic control of LLM-powered agents, substantially improving both accuracy and performance-per-dollar compared to unstructured, free-form chain-of-thought prompting (Amcalar et al., 17 Dec 2025). The methodology bridges graphical task decomposition and computational execution by LLMs, with roots in formal instruction-sequence representations for graphs (Lopez-Rubio, 11 Dec 2025) and textual graph formalism via Mermaid.

1. Formal Definition of Mermaid-Based Instruction Graphs

A Mermaid-based instruction graph is a 4-tuple $G=(V,E, \lambda_V, \lambda_E)$ where:

$V = \{v_1, v_2, ..., v_n\}$ denotes the set of nodes, each encoding an atomic reasoning step or constraint validation.
$E \subseteq V \times V$ defines directed edges $(i \to j)$ , representing dependencies or conditional transitions.
$\lambda_V: V \rightarrow \text{Strings}$ maps nodes to their Mermaid node labels, with recommended bounds $|\lambda_V(v)| \leq T_{\text{node}_\text{max}}$ (empirically 15–20 tokens).
$\lambda_E: E \rightarrow \text{Strings}$ assigns explicit guard or condition expressions to edges.

Bounding is imposed on token budgets per node and globally:

$\sum_{v \in V} |\lambda_V(v)| + \sum_{e \in E} |\lambda_E(e)| \leq T_{\text{prompt}_\text{max}}$

The graph is typically acyclic except for explicit verification or feedback loops. The canonical Mermaid syntax begins with:

1	flowchart TD;

and encodes nodes and edges on lines such as:

1	A[label_A] -- "cond" --> B[label_B]

Each edge is deterministic, governed by the Boolean condition on

\lambda_E

2. Construction Methodology

Graph construction for reasoning tasks proceeds as follows (as instantiated for AdvancedIF, GSM-Hard, and SCALE MultiChallenge):

Fact and Constraint Extraction: Identify all background facts $F$ and task constraints $C$ .
Node Allocation: For each $f_i \in F$ and $c_j \in C$ , create nodes with labels, e.g., $F_1[“Mask numeric literals”]$ , $C_1[“Check ≤250-word response”]$ .
Decomposition: Disaggregate the overall reasoning procedure into atomic subtasks $S_i$ (e.g., “Outline plan”, “Perform arithmetic”, “Verify tone”).
Edge and Logic Specification: For each logical or conditional dependency, add directed edges labeled by branching conditions or guard formulas.
Verification Funnel: Append verification nodes that aggregate outputs; all verifications must pass to reach the End node, enforcing correctness.

Example (toy arithmetic):

flowchart TD;
A[Parse input: “23+47”] --> B[Compute sum operation];
B -- "if masked" --> C[Perform addition manually];
B -- "else" --> D[Retrieve precomputed value];
C --> E[Output result];
D --> E;

The Mermaid code is generated directly by a high-tier model, without extraneous text.

3. Execution and Traversal Algorithm

Given a prompt-embedded instruction graph $G$ and an input question $Q$ , execution is a node-wise traversal adhering strictly to the DAG topology:

Initialize at $v = v_{\text{start}}$ (“Read question”).
At each node $v$ , emit a system prompt: “You are given a plan node: $\lambda_V(v)$ . Execute this step.” Obtain LLM output $O_v$ .
For all outgoing $(v \to u) \in E$ , evaluate the edge guard $\lambda_E(v \to u)$ on $O_v$ . On success, transition $v \leftarrow u$ ; otherwise, continue.
If no edge guard is satisfied, raise a TopologyError and trigger subgraph re-generation.
Repeat until $v = v_{\text{end}}$ . Aggregate result $O = \bigcup O_v$ .

Token bounding equations for inference cost:

Solve-only: $C_{\text{solve-only}} = C_{\text{inference}}$
Amortized: $C_{\text{amortized}} = (C_\text{BRAID}/N) + C_{\text{inference}}$
General model cost:

$C_{\text{model}} = \sum_{i=1}^Q (T_{\text{in},i} \cdot p_{\text{in}} + T_{\text{out},i} \cdot p_{\text{out}})$

with $p_{\text{in}}, p_{\text{out}}$ model-specific per-token pricing.

4. Quantitative Performance Gains

Empirical evaluation demonstrates that BRAID instruction graphs, instantiated in Mermaid, provide substantial gains in both accuracy and inference efficiency relative to classic chain-of-thought (CoT) prompting:

Table 1. Accuracy Gains

Dataset	Model Tier	Classic (%)	BRAID (%)	Δ points
GSM-Hard	gpt-5-nano-minimal	94.0	98.0	+4.0
SCALE MultiChallenge	gpt-4o	19.9	53.7	+33.8
AdvancedIF	gpt-5-nano-minimal	18.0	40.0	+22.0

Table 2. Performance-per-Dollar (PPD) Example (GSM-Hard)

Gen → Solve	Accuracy (%)	PPD (vs. gpt-5-medium=1.0)
gpt-4.1 → gpt-5-nano-minimal	96.0	74.06
gpt-5-medium → gpt-5-medium	99.0	1.00

PPD is defined as

$\text{PPD} = \frac{\text{Accuracy}/\text{Cost}}{(\text{Accuracy}_\text{ref}/\text{Cost}_\text{ref})}$

Thus, BRAID achieves up to 74× improvements in performance-per-dollar over unconstrained prompting paradigms on some tasks (Amcalar et al., 17 Dec 2025).

5. Practical Integration in LLM Agents

For system integration of Mermaid-based instruction graphs:

Parsing and Loading: Use Mermaid DSL tools (e.g., mermaid-cli or regular expressions) to parse the instruction graph into in-memory $(V,E)$ objects.
Caching and Amortization: Persist generated graphs keyed by template; reuse amortizes the prompt-generation cost $C_\text{BRAID}$ over $N$ queries.
Error Handling: Detect traversal issues (cycles, dead ends) and trigger dynamic re-planning via targeted subgraph regeneration.
Scaling: Modularize complex workflows as independent subgraphs (e.g., arithmetic, verification); load only modules necessary for the current question.
Node and Edge Principles: Maintain node atomicity ( $\leq 15$ tokens per label), deterministic edge conditions, and explicit verification/feedback loops as needed for self-correction.

This approach enables LLM autonomy under explicit bounding constraints and guarantees traceability of the reasoning process.

6. Relation to Instruction-Sequence Graph Encodings and Mermaid Syntax

Instruction-string representations, as introduced by López-Rubio (Lopez-Rubio, 11 Dec 2025), formalize graphs as machine-processable sequences over $\Sigma=\{U, D, L, R, E\}$ , where each symbol encodes cursor movement or edge insertion in the adjacency matrix. Decoding such an instruction string yields the corresponding adjacency structure, which can then be rendered in Mermaid as:

1 2	graph LR v{i}-->v{j}

for each adjacency

(i,j)

detected. This pipeline:

Decodes the instruction string into the adjacency matrix,
Emits Mermaid edge-list blocks, thereby providing a direct translation from instruction-sequence representations of graphs to Mermaid DAGs suitable for prompt-engineered reasoning plans.

Challenges include mapping pointer-movement traces to node layouts in Mermaid for large-scale graphs and ensuring that graph modularity in BRAID is preserved in Mermaid's statically rendered diagrams.

7. Significance and Implications

Mermaid-based instruction graphs provide a formal and practical bridge between structured reasoning, graphical workflow visualization, and LLM prompt optimization. Their bounded, machine-parsable format enables deterministic, token-efficient, and verifiable agent inference workflows, with empirical validation of superior performance and efficiency gains in diverse, challenging benchmarks (Amcalar et al., 17 Dec 2025). Their tight mapping to both graphical and instruction-sequence representations (as in (Lopez-Rubio, 11 Dec 2025)) suggests utility for both LLM-based reasoning and more general graph processing by LLMs. A plausible implication is accelerated adoption in production AI agents requiring traceable, stepwise, and controllable decision processes under explicit resource constraints.

Markdown Report Issue Upgrade to Chat

References (2)

BRAID: Bounded Reasoning for Autonomous Inference and Decisions (2025)

Representation of the structure of graphs by sequences of instructions (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Mermaid-based Instruction Graphs.