- The paper introduces DiagramGenBenchmark and DiagramAgent, enabling structured and editable diagram generation directly from textual inputs.
- The framework integrates four specialized agents (Plan, Code, Check, Diagram-to-Code) to convert and validate text into precise diagram code.
- Experiments show superior performance with high scores on Pass@1, ROUGE-L, and CodeBLEU metrics, supported by rigorous human evaluations.
From Words to Structured Visuals: A Benchmark and Framework for Text-to-Diagram Generation and Editing
Introduction
The paper introduces a novel task of text-to-diagram generation, emphasizing the creation of structured visual representations directly from textual inputs. Traditional approaches in text-to-image and text-to-code generation face considerable challenges in logical organization and flexibility, resulting in unstructured or difficult-to-edit diagram outputs.
Figure 1: Challenges in Existing Text-to-Image and Text-to-Code Methods for Diagram Generation.
To address these limitations, the paper presents two primary contributions: the DiagramGenBenchmark and the DiagramAgent framework. The benchmark evaluates diagram generation across eight categories, such as flowcharts and mind maps. In contrast, DiagramAgent integrates multiple modules—Plan Agent, Code Agent, Check Agent, and Diagram-to-Code Agent—to facilitate complex diagram generation and editing tasks effectively.
DiagramAgent Workflow
DiagramAgent's architecture comprises four main agents, each designed to handle specific tasks in the diagram generation and editing pipeline.
Figure 2: Workflow of DiagramAgent. The DiagramAgent handles diagram generation, coding, and editing tasks, processing the user query.
- Plan Agent: This component interprets user instructions and manages the initial stages of diagram generation, ensuring complete and coherent input for subsequent processing.
- Code Agent: Responsible for converting textual descriptions into diagram-specific code, this agent generates both new diagrams and modifications to existing ones.
- Check Agent: Ensures the logical coherence and correctness of the generated code, making necessary adjustments via feedback loops.
- Diagram-to-Code Agent: Facilitates the reverse engineering process, converting diagrams back into code for further refinement or editing.
DiagramGenBenchmark
The DiagramGenBenchmark provides a comprehensive dataset tailored for evaluating diagram generation and editing tasks. It encompasses eight diverse diagram types, providing a robust platform for assessing the capabilities of models like DiagramAgent.
Figure 3: Example queries and diagrams.
Experimental Results
The paper's extensive experiments highlight DiagramAgent's superior performance over existing baselines, achieving high scores on metrics such as Pass@1, ROUGE-L, and CodeBLEU.
Figure 4: Human evaluation results for different models on diagram generation and Modify diagram generation tasks.
Key results demonstrate DiagramAgent's efficacy in generating both accurate code and visually coherent diagrams. These outcomes are reinforced by human evaluations, aligning closely with objective metrics, and confirming the agent's usability in real-world diagram generation scenarios.
Conclusion
The introduction of DiagramGenBenchmark and DiagramAgent sets a foundational standard for text-to-diagram generation and editing tasks. While the research significantly advances the field, it also identifies areas for future development, particularly in enhancing the model's handling of complex diagram structures. The framework not only provides a powerful toolset for structured diagram generation but also establishes a new benchmark for future research.