High level implementation of geometric multigrid solvers for finite element problems: applications in atmospheric modelling

Published 2 May 2016 in cs.MS, math.NA, and physics.flu-dyn | (1605.00492v2)

Abstract: The implementation of efficient multigrid preconditioners for elliptic partial differential equations (PDEs) is a challenge due to the complexity of the resulting algorithms and corresponding computer code. For sophisticated finite element discretisations on unstructured grids an efficient implementation can be very time consuming and requires the programmer to have in-depth knowledge of the mathematical theory, parallel computing and optimisation techniques on manycore CPUs. In this paper we show how the development of bespoke multigrid preconditioners can be simplified significantly by using a framework which allows the expression of the each component of the algorithm at the correct abstraction level. Our approach (1) allows the expression of the finite element problem in a language which is close to the mathematical formulation of the problem, (2) guarantees the automatic generation and efficient execution of parallel optimised low-level computer code and (3) is flexible enough to support different abstraction levels and give the programmer control over details of the preconditioner. We use the composable abstractions of the Firedrake/PyOP2 package to demonstrate the efficiency of this approach for the solution of strongly anisotropic PDEs in atmospheric modelling. The weak formulation of the PDE is expressed in Unified Form Language (UFL) and the lower PyOP2 abstraction layer allows the manual design of computational kernels for a bespoke geometric multigrid preconditioner. We compare the performance of this preconditioner to a single-level method and hypre's BoomerAMG algorithm. The Firedrake/PyOP2 code is inherently parallel and we present a detailed performance analysis for a single node (24 cores) on the ARCHER supercomputer. Our implementation utilises a significant fraction of the available memory bandwidth and shows very good weak scaling on up to 6,144 compute cores.