Papers
Topics
Authors
Recent
Search
2000 character limit reached

Scylla: A Mesos Framework for Container Based MPI Jobs

Published 20 May 2019 in cs.PF and cs.DC | (1905.08386v1)

Abstract: Open source cloud technologies provide a wide range of support for creating customized compute node clusters to schedule tasks and managing resources. In cloud infrastructures such as Jetstream and Chameleon, which are used for scientific research, users receive complete control of the Virtual Machines (VM) that are allocated to them. Importantly, users get root access to the VMs. This provides an opportunity for HPC users to experiment with new resource management technologies such as Apache Mesos that have proven scalability, flexibility, and fault tolerance. To ease the development and deployment of HPC tools on the cloud, the containerization technology has matured and is gaining interest in the scientific community. In particular, several well known scientific code bases now have publicly available Docker containers. While Mesos provides support for Docker containers to execute individually, it does not provide support for container inter-communication or orchestration of the containers for a parallel or distributed application. In this paper, we present the design, implementation, and performance analysis of a Mesos framework, Scylla, which integrates Mesos with Docker Swarm to enable orchestration of MPI jobs on a cluster of VMs acquired from the Chameleon cloud [1]. Scylla uses Docker Swarm for communication between containerized tasks (MPI processes) and Apache Mesos for resource pooling and allocation. Scylla allows a policy-driven approach to determine how the containers should be distributed across the nodes depending on the CPU, memory, and network throughput requirement for each application.

Citations (14)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.