Papers
Topics
Authors
Recent
Search
2000 character limit reached

A performance evaluation of CCS QCD Benchmark on the COMA (Intel(R) Xeon Phi$^{TM}$, KNC) system

Published 20 Dec 2016 in hep-lat and physics.comp-ph | (1612.06556v1)

Abstract: The most computationally demanding part of Lattice QCD simulations is solving quark propagators. Quark propagators are typically obtained with a linear equation solver utilizing HPC machines. The CCS QCD Benchmark is a benchmark program solving the Wilson-Clover quark propagator, and is developed at the Center for Computational Sciences (CCS), University of Tsukuba. We optimized the benchmark program for a \Intel \XeonPhi (Knights Corner, KNC) system named "COMA (PACS-IX)" at CCS Tsukuba under the Intel Parallel Computing Center program. A single precision BiCGStab solver with the overlapped Restricted Additive Schwarz (RAS) preconditioner was implemented using SIMD intrinsics, OpenMP and MPI in the offload mode. With the reverse-offloading technique, we could reduce the communication and offloading overheads. We observed a performance of $\sim 200$ GFlops sustained for the Wilson-Clover hopping matrix multiplication on the lattice sizes larger than $243\times 32$ on a sinlge card of the COMA system. A good weak scaling perofmace was observed on the local lattice sizes larger than $243\times 32$.

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.