Determine optimal Unified Memory pool settings for large-mesh assembly performance

Determine the optimal configuration of the NVIDIA HPC SDK nvc++ Unified Memory pool parameters (NVCOMPILER_ACC_POOL_ALLOC, NVCOMPILER_ACC_POOL_SIZE, and NVCOMPILER_ACC_POOL_THRESHOLD) to maximize assembly-phase performance of the OpenFOAM laplacianFoam proof-of-concept across mesh sizes and GPU architectures, thereby preventing per-iteration deallocation and associated slowdowns observed under default settings.

Background

For larger meshes (Mesh-L and Mesh-XL), assembly performance degraded under default Unified Memory pool behavior due to frequent deallocations. Adjusting the pool via NVCOMPILER_ACC_POOL_* environment variables significantly improved performance in most cases, indicating that allocator configuration is a key determinant of performance.

The paper explicitly defers a deeper analysis of best allocator settings to future work, leaving open the question of how to choose pool parameters systematically for different meshes and platforms to avoid performance cliffs.

References

Although interesting, this work is not focused on deeply investigating the best allocator size setting and leave this analysis for future work.

Building an Accelerated OpenFOAM Proof-of-Concept Application using Modern C++  (2507.18268 - Malenza et al., 24 Jul 2025) in Section: Evaluation, Subsection: Performance results on single GPU