Evaluating Emerging CXL-enabled Memory Pooling for HPC Systems

Published 4 Nov 2022 in cs.DC | (2211.02682v1)

Abstract: Current HPC systems provide memory resources that are statically configured and tightly coupled with compute nodes. However, workloads on HPC systems are evolving. Diverse workloads lead to a need for configurable memory resources to achieve high performance and utilization. In this study, we evaluate a memory subsystem design leveraging CXL-enabled memory pooling. Two promising use cases of composable memory subsystems are studied -- fine-grained capacity provisioning and scalable bandwidth provisioning. We developed an emulator to explore the performance impact of various memory compositions. We also provide a profiler to identify the memory usage patterns in applications and their optimization opportunities. Seven scientific and six graph applications are evaluated on various emulated memory configurations. Three out of seven scientific applications had less than 10% performance impact when the pooled memory backed 75% of their memory footprint. The results also show that a dynamically configured high-bandwidth system can effectively support bandwidth-intensive unstructured mesh-based applications like OpenFOAM. Finally, we identify interference through shared memory pools as a practical challenge for adoption on HPC systems.

Abstract PDF Upgrade to Chat

Authors (3)

Citations (22)

View on Semantic Scholar

Summary

The paper demonstrates that using up to 75% pooled memory results in less than 18% performance degradation for scientific workloads, validating the approach.
The methodology employs an emulator and profiler to assess diverse memory compositions and high-bandwidth configurations across various HPC applications.
The study identifies challenges with interference in shared memory pools, emphasizing the need for effective system-level coordination in dynamic memory environments.

Evaluating Emerging CXL-enabled Memory Pooling for HPC Systems

Introduction

The paper discusses the potential of Compute Express Link (CXL)-enabled memory pooling in addressing the limitations of current high-performance computing (HPC) systems, which suffer from low memory utilization due to statically configured memory resources. CXL's ability to decouple memory capacity from bandwidth provision allows for more granular memory management, improving resource utilization. This study evaluates composable memory subsystems and showcases the benefits of CXL-enabled memory pooling for various HPC workloads.

CXL-enabled Memory Subsystem Design

CXL is a standard that allows processors, accelerators, and memory to be interconnected with low latency and high bandwidth, significantly enhancing memory disaggregation capabilities. The paper introduces a potential design of a composable memory system using CXL links (Figure 1). This system allows dynamic configuration of memory resources, such as scaling bandwidth or incorporating different memory types. The dynamic configuration can cater to the specific needs of various workloads, optimizing performance and utilization.

Figure 1: An potential composable memory system design. CXL enables multiple memory organizations on one system.

Methodology

The research employs an emulator to explore the effects of diverse memory compositions using CXL-enabled systems. A profiling tool was developed to analyze dynamic memory usage patterns across various applications. The emulator and profiler were used on seven scientific and six graph applications, assessing performance impact via various memory configurations.

Emulation was conducted on NUMA systems, representing CXL-enabled memory subsystems, to measure the effect of pooled memory on execution times and bandwidth scaling.

Performance Evaluation

Composable Memory Capacity

The study evaluated the performance impact of different compositions of local and pooled memory (Figure 2). For most scientific workloads, using up to 75% pooled memory resulted in less than 18% performance degradation, demonstrating the feasibility of memory capacity composability in mitigating performance impacts.

Figure 2: Four compositions of the memory subsystem using a variable amount of local memory and pooled memory.

Composable Memory Bandwidth

For bandwidth-intensive applications, a high-bandwidth configuration using CXL links was evaluated (Figure 3). The results indicated significant performance improvements, particularly for applications like OpenFOAM and Hypre, suggesting that CXL-enabled memory systems could serve as cost-effective alternatives to expensive HBM memory systems.

Figure 3: An emulated high-bandwidth configuration of the memory system. Increased CXL links provide more bandwidth.

Interference in Shared Memory Pools

Experiments demonstrated that interference in shared memory pools could degrade performance, especially for bandwidth-sensitive jobs. Application performance was tested under varying shared conditions (Figure 4). Results underscored the necessity for system-level coordination to mitigate interference and effectively manage shared resources.

Figure 4: An emulated configuration of a memory pool shared by multiple hosts, evaluating the impact of interference.

Conclusion

The paper highlights the promise of CXL-enabled memory pooling in enhancing memory utilization and performance within HPC environments. By decoupling bandwidth and capacity provisioning, CXL allows for customized memory configurations that align with specific workload requirements. Although challenges, such as managing interference on shared pools, remain, the potential benefits in scalable bandwidth and resource efficiency offer a compelling case for further exploration of CXL-enabled systems in HPC contexts.

Markdown Report Issue