A Multi-Threaded Version of MCFM

Published 20 Mar 2015 in physics.comp-ph, cs.DC, cs.MS, and hep-ph | (1503.06182v1)

Abstract: We report on our findings modifying MCFM using OpenMP to implement multi-threading. By using OpenMP, the modified MCFM will execute on any processor, automatically adjusting to the number of available threads. We modified the integration routine VEGAS to distribute the event evaluation over the threads, while combining all events at the end of every iteration to optimize the numerical integration. Special care has been taken that the results of the Monte Carlo integration are independent of the number of threads used, to facilitate the validation of the OpenMP version of MCFM.

Abstract PDF Upgrade to Chat

Citations (231)

View on Semantic Scholar

Summary

The paper introduces multi-threading using OpenMP to dramatically accelerate Monte Carlo event simulations without compromising numerical accuracy.
The authors parallelize the VEGAS integration in MCFM by embedding OpenMP directives into FORTRAN code while carefully managing shared and thread-local variables.
Performance tests across various hardware, including the Intel Xeon Phi, reveal speed-ups over 100x in NLO processes, highlighting the scalability of the approach.

Overview of Multi-Threaded MCFM

The paper "A Multi-Threaded Version of MCFM" presents significant enhancements to the Monte Carlo for FeMtobarn (MCFM) program, primarily focusing on implementing multi-threading using the OpenMP standard. By introducing parallel processing capabilities, the authors aim to leverage modern multi-core processors effectively, achieving substantial improvements in computational efficiency for Monte Carlo event generation. Their modifications ensure that, despite varying thread counts, the numerical integration results remain consistent, simplifying validation processes.

Implementation Details

To incorporate multi-threading, the authors utilized OpenMP, a widely adopted API for multi-platform shared memory multiprocessing. The OpenMP directives were embedded into the MCFM's FORTRAN code, enabling a parallelized execution without substantial code alterations. Key modifications involved handling data structures in a parallel setting, ensuring variable correctness by distinguishing shared and thread-local variables. The use of critical and atomic OpenMP constructs ensures that random number generation and summation processes are managed effectively to maintain deterministic outcomes.

The paper specifically highlights careful modification of the VEGAS integration routine, which now distributes event evaluation across threads, leading to optimized numerical convergence. The strategy includes maintaining consistent results across different thread counts, which is integral for reliable validation against the non-parallelized MCFM version.

Performance Evaluation

The researchers conducted benchmarking tests on various hardware configurations, including Intel Core i7-4770, dual Intel Xeon X5650, quadruple AMD 6128 HE Opteron, and the Intel Xeon Phi 5110P coprocessor. Performance was evaluated for the process $PP \rightarrow H(\rightarrow b\bar{b}) + 2$ jets, both at leading order (LO) and next-to-leading order (NLO).

The findings demonstrate substantial acceleration with increasing thread counts, particularly noted in the computationally intensive NLO processes. For instance, the Intel Xeon Phi coprocessor achieved a remarkable acceleration factor of over 100 in NLO evaluations, emphasizing its computational potential despite slower individual core speeds. This illustrates the viability of multi-threading for Monte Carlo simulations in high-energy physics, providing significant reductions in evaluation time across different hardware architectures.

Implications and Future Prospects

The advancements in MCFM through OpenMP parallelization open pathways for more complex process simulations within reasonable computational timescales. This work demonstrates the critical importance of adapting software to align with hardware evolution, particularly given the stagnation of increases in single-core processor speeds. As compute architectures evolve, such strategies will ensure sustained performance improvements, adhering to trends like Moore's Law.

Future developments could see further exploration into more sophisticated processes or scaling towards next-to-next-to-leading order calculations, leveraging rapidly advancing processor technologies such as the upcoming Xeon Phi iterations. This paper thus sets a foundation for robust, scalable Monte Carlo simulations in both current and future computational environments.

Markdown Report Issue