Papers
Topics
Authors
Recent
Search
2000 character limit reached

Scaling on Frontier: Uncertainty Quantification Workflow Applications using ExaWorks to Enable Full System Utilization

Published 1 Jul 2024 in cs.DC | (2407.01484v1)

Abstract: When running at scale, modern scientific workflows require middleware to handle allocated resources, distribute computing payloads and guarantee a resilient execution. While individual steps might not require sophisticated control methods, bringing them together as a whole workflow requires advanced management mechanisms. In this work, we used RADICAL-EnTK (Ensemble Toolkit) - one of the SDK components of the ECP ExaWorks project - to implement and execute the novel Exascale Additive Manufacturing (ExaAM) workflows on up to 8000 compute nodes of the Frontier supercomputer at the Oak Ridge Leadership Computing Facility. EnTK allowed us to address challenges such as varying resource requirements (e.g., heterogeneity, size, and runtime), different execution environment per workflow, and fault tolerance. And a native portability feature of the developed EnTK applications allowed us to adjust these applications for Frontier runs promptly, while ensuring an expected level of resource utilization (up to 90%).

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.