Applying Process Mining on Scientific Workflows: a Case Study on High Performance Computing Data
Abstract: Computer-based scientific experiments are becoming increasingly data-intensive, necessitating the use of High-Performance Computing (HPC) clusters to handle large scientific workflows. These workflows result in complex data and control flows within the system, making analysis challenging. This paper focuses on the extraction of case IDs from SLURM-based HPC cluster logs, a crucial step for applying mainstream process mining techniques. The core contribution is the development of methods to correlate jobs in the system, whether their interdependencies are explicitly specified or not. We present our log extraction and correlation techniques, supported by experiments that validate our approach, enabling comprehensive documentation of workflows and identification of performance bottlenecks.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.