Parallelizing a modern GPU simulator

Published 20 Feb 2025 in cs.DC, cs.AR, and cs.PF | (2502.14691v2)

Abstract: Simulators are a primary tool in computer architecture research but are extremely computationally intensive. Simulating modern architectures with increased core counts and recent workloads can be challenging, even on modern hardware. This paper demonstrates that simulating some GPGPU workloads in a single-threaded state-of-the-art simulator such as Accel-sim can take more than five days. In this paper we present a simple approach to parallelize this simulator with minimal code changes by using OpenMP. Moreover, our parallelization technique is deterministic, so the simulator provides the same results for single-threaded and multi-threaded simulations. Compared to previous works, we achieve a higher speed-up, and, more importantly, the parallel simulation does not incur any inaccuracies. When we run the simulator with 16 threads, we achieve an average speed-up of 5.8x and reach 14x in some workloads. This allows researchers to simulate applications that take five days in less than 12 hours. By speeding up simulations, researchers can model larger systems, simulate bigger workloads, add more detail to the model, increase the efficiency of the hardware platform where the simulator is run, and obtain results sooner.