Papers
Topics
Authors
Recent
Search
2000 character limit reached

Mermaid Tool for Diagram Generation

Updated 21 January 2026
  • Mermaid tool is a markup language that enables the creation of structured diagrams, such as flowcharts and sequence diagrams, using simple text commands.
  • It incorporates both manual and automated workflows, with benchmarks like MermaidSeqBench evaluating syntax correctness and diagram fidelity.
  • Advanced methodologies include converting static flowchart images into editable code using vision-language models to enhance diagram accuracy and usability.

The Mermaid tool refers to a family of markup syntaxes and associated rendering engines for expressing structured diagrams—especially flowcharts and sequence diagrams—as executable plain text. It enables both human designers and automated systems to describe workflows, processes, and interactions in a succinct textual form that can be rendered as vector or raster graphics in documentation, wikis, or design interfaces. Mermaid syntax is machine-parseable and widely adopted in technical environments for software engineering and process modeling. Recent research highlights critical advances at the intersection of Mermaid diagram generation, natural language processing, and computer vision, with a focus on benchmarks for LLM-authored diagrams and automated conversion of diagram images into editable Mermaid code (Shbita et al., 18 Nov 2025, Deka et al., 1 Dec 2025).

1. Diagram Specification and Mermaid Syntax

Mermaid syntax is a domain-specific language for representing diagrammatic structures (sequence diagrams, flowcharts, among others) in a textual format. Sequence diagrams use constructs such as participant, activate, deactivate, and control-flow blocks (alt, else, end) to model interactions between components. Flowcharts are expressed with node declarations and directional edges, allowing constructs for processes, decisions, and loops.

A canonical flowchart snippet in Mermaid is:

1
2
3
4
5
6
7
flowchart TB
  Start([Start])
  Process1["First process"]
  Decision1{"Is result valid?"}
  Start --> Process1
  Process1 --> Decision1
  Decision1 -- Yes --> End([End])

This syntax supports version-control, textual diffing, and automated generation, lending itself to both human-driven and programmatic workflows (Deka et al., 1 Dec 2025).

2. Automated Generation and Evaluation: MermaidSeqBench

Recent evaluation methodologies center on the ability of LLMs to produce syntactically valid and semantically faithful Mermaid sequence diagrams given natural language prompts. MermaidSeqBench is a benchmark engineered to probe LLM-to-Mermaid generation, providing 132 test cases derived from a blend of human curation, LLM-driven synthetic expansion, and rule-based variation for surface-level diversity (Shbita et al., 18 Nov 2025). Each test case pairs a detailed natural language specification (with "Purpose," "Main Components," and "Interactions") to a reference Mermaid script.

Evaluation is decomposed into the following axes:

  • Syntax Correctness: Defined by the correspondence of syntactic tokens (arrows, keywords, braces) between generated and reference diagrams. Precision, recall, and F1 metrics are computed over these tokens.
  • Activation Handling: Measures fidelity in lifeline activation/deactivation via precision, recall, and F1 over activate/deactivate events.
  • Error Handling: Assesses rendition of control-flow branches (alt, else, end) using precision/recall on correct block placement and nesting.
  • Practical Usability: Aggregates the above, or applies a holistic completeness and logical-consistency judgment—operationalized as macro-average F1 or as an explicit binary assessment on fidelity.

Automated evaluation employs LLM-based judges (671B DeepSeek-V3 and 120B GPT-OSS), scoring candidate diagrams along multiple axes without human raters. Scores are averaged across judges per diagram and metric. This yields reproducible, fine-grained metrics sensitive to both structural and semantic fidelity (Shbita et al., 18 Nov 2025).

3. Vision-Language Diagram Extraction: Flowchart2Mermaid

Beyond language-based generation, research addresses the challenge of converting static flowchart images into editable Mermaid.js code. Flowchart2Mermaid is an interactive web system that leverages vision-LLMs (VLMs)—such as GPT-4.1, GPT-4o, Gemini-2.5-Flash—to analyze flowchart images and translate them, via a precisely engineered system prompt, into minimal valid Mermaid.js programs (Deka et al., 1 Dec 2025).

The system is architected with a browser-based front end and a Node.js backend. User workflows are enhanced by:

  • Inline Text Editing: Direct manipulation of node labels with automatic Mermaid code synchronization.
  • Drag-and-Drop Node Insertion: Visual construction with instant code generation.
  • Natural-Language Commands: GPT-4.1–driven assistant interprets user instructions to update diagrams semantically.

All views and code remain synchronized, permitting seamless transitions between graphical and textual editing modalities.

4. Quantitative Evaluation Metrics and Empirical Findings

Assessment of diagram fidelity in both LLM and VLM settings uses graph-aligned symbolic and structural metrics:

Metric Formula / Description Range
Structural Accuracy (SA) SA=∣Npred∩Ngt∣+∣Epred∩Egt∣∣Ngt∣+∣Egt∣SA = \frac{|N_{pred}\cap N_{gt}| + |E_{pred}\cap E_{gt}|}{|N_{gt}| + |E_{gt}|} [0,1]
Flow Correctness (FC) FC=# correct execution paths# ground truth pathsFC = \frac{\#\, \text{correct execution paths}}{\#\, \text{ground truth paths}} [0,1]
Syntax Validity (SV) SV=# syntactically valid diagramstotal diagramsSV = \frac{\#\, \text{syntactically valid diagrams}}{\text{total diagrams}} [0,1]
Completeness (C) C=∣Npred∩Ngt∣∣Ngt∣C = \frac{|N_{pred}\cap N_{gt}|}{|N_{gt}|} [0,1]

On benchmark datasets, large VLMs (GPT-4.1, Gemini-2.5-Flash) achieve near-perfect node/edge F1, structural accuracy (SA ≈ 0.988), and syntax validity (SV > 0.99). Smaller models perform significantly worse, particularly omitting branches or mislabeling nodes (Deka et al., 1 Dec 2025). For LLM-based sequence diagram generation, larger models (7B–8B) consistently outperform smaller ones (0.5B–2B), with F1 syntax rising from ≈0.49 (Qwen 0.5B) to >0.85 (Qwen 7B), and a similar trend in activation and error handling. Smaller models frequently omit or misplace vital Mermaid keywords, with overall practical usability scores rarely exceeding 0.40, compared to ~0.75–0.85 for large models (Shbita et al., 18 Nov 2025).

5. Interactive Workflow and Best Practices for Mermaid-Based Automation

Integrating Mermaid with advanced language and vision models enables several practical workflows:

  • Uploading static images for automatic code extraction and iterative correction via VLMs.
  • Generating sequence diagrams from narrative descriptions, verified against reference scripts via LLM-as-a-judge benchmarks.
  • Editing and refining diagrams through a mixed-initiative model (inline, drag-and-drop, NL commands).

Empirically supported best practices include:

  • Prefer large models (≥7B parameters) for high-fidelity syntax and semantics.
  • Provide 2–3 in-context Mermaid examples in prompts to prime model output.
  • Perform syntax validation using the Mermaid Live Editor or CLI as a post-processing step.
  • Normalize participant names to encourage model consistency.
  • Explicitly mention alternative/control-flow constructs in NL specifications to boost inclusion in outputs.
  • Apply lightweight rule-based consistency checks in tandem with LLM-powered judgments for quality control (Shbita et al., 18 Nov 2025).

6. Limitations and Further Directions

Current systems exhibit several constraints:

  • VLMs may hallucinate nodes or misinterpret faint/complex structures in low-quality diagrams.
  • Mermaid code generation is limited by underlying model capacity; edge cases and complex constructs such as nested loops and swimlanes are imperfectly rendered.
  • Automation does not obviate the need for human oversight; semantic errors and subtle branching mistakes persist, particularly for mid-sized or smaller models.
  • Flowchart2Mermaid does not yet support BPMN, UML, or real-time diagram consistency verification, though these are cited as future research directions (Deka et al., 1 Dec 2025).

This suggests that while automation tools exploiting Mermaid can dramatically accelerate structured diagram creation, domain expertise remains essential for error analysis and the curation of complex workflows.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Mermaid Tool.