A Parallel and Highly-Portable HPC Poisson Solver: Preconditioned Bi-CGSTAB with alpaka
Abstract: This paper presents the design, implementation, and performance analysis of a parallel and GPU-accelerated Poisson solver based on the Preconditioned Bi-Conjugate Gradient Stabilized (Bi-CGSTAB) method. The implementation utilizes the MPI standard for distributed-memory parallelism, while on-node computation is handled using the alpaka framework: this ensures both shared-memory parallelism and inherent performance portability across different hardware architectures. We evaluate the solver's performances on CPUs and GPUs (NVIDIA Hopper H100 and AMD MI250X), comparing different preconditioning strategies, including Block Jacobi and Chebyshev iteration, and analyzing the performances both at single and multi-node level. The execution efficiency is characterized with a strong scaling test and using the AMD Omnitrace profiling tool. Our results indicate that a communication-free preconditioner based on the Chebyshev iteration can speed up the solver by more than six times. The solver shows comparable performances across different GPU architectures, achieving a speed-up in computation up to 50 times compared to the CPU implementation. In addition, it shows a strong scaling efficiency greater than 90% up to 64 devices.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.