Value of parallelizing conversion phases

Determine whether parallelizing the AoS-to-SoA and SoA-to-AoS conversion prologue and epilogue phases yields significant performance improvements in practice for task-based or multi-rank simulation settings where compute steps often do not own many or all threads, and characterize the conditions under which such parallelization is beneficial.

Background

In task-based or multi-rank executions, different algorithmic steps may run concurrently on different threads, so fewer threads than the full node configuration are available to any single compute step. The current prototype performs conversion prologues/epilogues single-threaded, raising the question of whether parallelizing these phases would help.

The authors note it might be reasonable to parallelize conversions but explicitly state that it is not clear whether this adds significant value because compute steps rarely own many or all threads. Establishing when such parallelization pays off requires empirical evaluation and modeling of contention, memory bandwidth, and parallel overheads.

References

It might be reasonable to parallelise the conversion into SoA and back. In practice, it is not clear if this adds significant value as a step rarely 'owns' many or all threads.

Compiler support for semi-manual AoS-to-SoA conversions with data views  (2405.12507 - Radtke et al., 2024) in Section 5 (Benchmark results), Context paragraph