SwarmFoam: Intelligent CFD Simulation
- SwarmFoam is a multi-agent CFD simulation framework that integrates LLMs for parsing multi-modal inputs and automating OpenFOAM case setups.
- It orchestrates six specialized agents to manage tasks from geometry extraction and file generation to intelligent error diagnosis and correction.
- The system employs Retrieval-Augmented Generation to enhance context-rich file creation, achieving an 84% pass rate and reducing token usage significantly.
SwarmFoam is a multi-agent system for intelligent Computational Fluid Dynamics (CFD) simulation built atop OpenFOAM and powered by multiple types of LLMs. It orchestrates six specialized agents to parse and translate multi-modal input (image and text), generate and correct OpenFOAM case files, execute simulations, diagnose errors, and produce scientific visualizations. SwarmFoam is distinguished by dedicated modules for Multi-Modal Perception, Intelligent Error Correction, and Retrieval-Augmented Generation (RAG), allowing it to handle complex geometry and simulation tasks that are challenging for prior LLM-driven CFD frameworks. Experimental evaluation demonstrates robust adaptability across simulation inputs, achieving a pass rate of 84% over 25 test cases (Yang et al., 12 Jan 2026).
1. System Composition and Agent Roles
SwarmFoam is architected as a six-agent system, each agent mapped to a distinct phase of the simulation pipeline. Agents communicate through a shared message bus and interact with modular LLM interfaces.
| Agent | Functional Role | Example Actions |
|---|---|---|
| Observer | Input decomposition, perception | ObservePicture, DivideTask |
| Architect | Case structure selection | SetupFramework |
| InputWriter | File generation/correction | FirstWrite, CorrectFile |
| Runner | Simulation execution | RunFoamCase |
| Reviewer | Error identification/correction | HandleError, EndMark |
| ParaMaster | Post-processing/visualization | WriteCode, RunCode |
The system explicitly embeds each agent’s context into standardized prompt templates when invoking LLMs, maintaining consistency in both file format and simulation instructions.
2. LLM Support and Prompt Construction
SwarmFoam utilizes two categories of LLMs:
- Text-only LLMs (Type 1 interface, e.g., Deepseek-R1, Deepseek-V3, gemini-2.5-pro, gpt-5-pro, gpt-4o) for natural language-driven subtasks.
- Multi-Modal LLMs (Type 2 interface, e.g., gemini-2.5-flash, Qwen-series, Llama-4-series) for combined image and text input.
Agents select and initialize an LLM according to the modality required for their task. Prompts are constructed using “standard templates” with placeholders for task descriptions, physical and geometric information, and reference text snippets retrieved via RAG. The LLMs are set to temperature to enforce strict output syntactic constraints required by OpenFOAM, followed by post-processing to remove extraneous tokens.
3. Multi-Modal Perception Pipeline
Image and text data fusion for geometry and physical property extraction is central to SwarmFoam’s flexibility. Two methods were tested; the pre-parsing method was chosen via ablation:
Method 1 (Pre-Parsing):
- Tokenize and embed text () and image () using HuggingfaceEmbeddings.
- Concatenate embeddings, input them to a multi-modal LLM using a detailed ObserverPicture prompt.
- Extract structured "Geometric description" (e.g., dimensions, vertex coordinates, face labels) and "Physical description" (e.g., fluids, boundary conditions).
- Pass these textual outputs to InputWriter for generating blockMeshDict and related files.
Method 2 (Direct Utilization):
- InputWriter sends raw image and text directly to MM-LLM to generate blockMeshDict.
- Ablation revealed severe degradation: 150.8% increase in iterations, 54% drop in pass rate.
The selection of Method 1 resulted in improved preservation of spatial and physical information, supporting complex OpenFOAM case preparation.
4. Intelligent Error Correction Mechanism
SwarmFoam’s Reviewer agent implements a first-error-priority algorithm for error handling during simulation:
1 2 3 4 5 6 7 8 9 |
Algorithm 1: Error Diagnosis and Correction Require: Message history H, Configuration C Ensure: Error diagnosis D, Error file path F ... 14: first_err ← errors[0] 15: (Dparsed,Fparsed) ← Parse_Error_Log(first_err) 16: error_type ← Classify_Error_Type(LLM_Agent, first_err, files) ... 24: return D,F |
Only the first error in the simulation log is evaluated, classified as either “format error” or “missing file,” triggering file regeneration via InputWriter. This process iterates until either simulation success or the threshold is reached. This strategy yields marked reductions in token usage compared to prior frameworks.
5. Retrieval-Augmented Generation (RAG) Framework
Six local help documents (case structures, solver guidance, etc.) are indexed using a chunking and embedding process:
- Indexing: Documents split into text chunks, embedded via HuggingfaceEmbeddings, stored in a local vector DB.
- Querying: Agents query the vector DB with the current prompt; the top- nearest chunks (cosine similarity) are retrieved and appended under
<Reference information>in the agent’s template.
The RAG mechanism provides low-latency, context-rich inferences for architecture selection and file generation, enhancing case fidelity and robustness.
6. Performance Evaluation and Metrics
Evaluation used 25 cases (10 natural language, 15 multi-modal), encompassing diverse CFD regimes (incompressible, multiphase, combustion, MHD):
- Iterations:
- Token Usage:
- Pass Rate:
- Cost:
SwarmFoam achieved 84% pass rate overall (80% NL, 86.7% multi-modal). Relative to Foam-Agent, SwarmFoam matched pass rate while reducing token usage by 83.85%, attributed to the first-error-priority correction loop.
7. Representative Workflow Example
A prototypical run consists of:
- User submits a text instruction ("Please use pisoFoam to solve the flow field after airflow passes obstacle...") and an image (schematic with geometric specifications).
- Observer Agent parses geometry and physical parameters (domain, obstacle location, flow variables, inlet BCs, time parameters).
- Architect Agent retrieves matching case structure and emits file generation instructions.
- InputWriter Agent generates each file: blockMeshDict, controlDict, fvSchemes, fvSolution, transportProperties, velocity/pressure files, Allrun, leveraging RAG and LLM synthesis.
- Runner Agent executes Allrun; if error encountered (e.g., missing blockMeshDict), Reviewer Agent diagnoses and InputWriter generates the missing/corrected file.
- Iterative correction continues until simulation is successful.
- On completion and post-processing request, ParaMaster Agent produces ParaView visualizations via pvpython scripts, yielding publication-ready output images.
This workflow highlights SwarmFoam’s capacity for dual-modality case specification, automated error resolution, and end-to-end CFD pipeline automation (Yang et al., 12 Jan 2026).