MFEM Library: Modular Finite Elements

Updated 6 January 2026

MFEM is an open-source C++ library for finite element discretizations that emphasizes modularity, high-order accuracy, and efficient, scalable performance.
It supports a wide range of methods including continuous, mixed, and discontinuous Galerkin approaches with matrix-free and partial assembly techniques.
Designed for exascale environments, MFEM enables parallel computations on CPUs and GPUs and has been validated across numerous scientific and engineering applications.

The MFEM (Modular Finite Element Methods) Library is an open-source, high-performance C++ library for finite element discretizations, designed to provide researchers and practitioners with a flexible, modular, and scalable platform for the development and application of advanced finite element methods in computational science and engineering. MFEM supports arbitrary high-order elements, a variety of discretization strategies, and is architected for efficiency and portability across leadership-class parallel computing facilities, including exascale GPU-accelerated machines (Andrej et al., 2024, Anderson et al., 2019).

1. Design Philosophy and Core Architecture

MFEM is structured around a clean, layered, object-oriented architecture in which fundamental mathematical entities such as meshes, finite element spaces, bilinear forms, and solvers are encapsulated in discrete, composable classes. The principal design goals are:

Modularity: Separation of core abstractions for mesh, polynomial space, variational forms, and linear algebra facilitates rapid prototyping and algorithmic extensions.
High-order accuracy: Full support for arbitrarily high polynomial order $p$ in 1D, 2D, and 3D, for all standard spaces in the de Rham complex— $H^1$ , $H(\mathrm{div})$ , $H(\mathrm{curl})$ , $L^2$ —including advanced spaces such as NURBS for isogeometric analysis.
Performance and scalability: Matrix-free operator evaluation, sum-factorization, partial-assembly techniques, and support for distributed memory (MPI) and device backends (CUDA, HIP, SYCL, OpenMP, RAJA) targeting CPUs and GPUs.
Extensibility and usability: Hierarchical base classes permit user-defined finite elements, integrators, and solvers. Discretizations and physics modules may be composed at runtime (Anderson et al., 2019, Andrej et al., 2024, Cruz, 2022, Cruz, 2021).

MFEM’s parallel mesh objects (ParMesh) and corresponding spaces (ParFiniteElementSpace, ParGridFunction, etc.) enable domain decomposition over thousands of MPI ranks, supporting hybrid mesh types with automatic load balancing via METIS or similar partitioners (Martinez-Weissberg et al., 30 Dec 2025, Andrej et al., 2024).

2. Supported Finite Element Methods and Variational Forms

MFEM enables discretization of PDEs using a wide suite of element and function space types:

Continuous Galerkin ( $H^1$ ): Lagrange nodal bases of arbitrary degree for scalar and vector problems. Variational forms are assembled via integrators such as DiffusionIntegrator (for stiffness) and MassIntegrator.
Mixed ( $H(\mathrm{div}) \times L^2$ ): Raviart–Thomas spaces for flux variables coupled to discontinuous $L^2$ for pressures, used in Darcy, Stokes, and elasticity saddle-point systems, implemented via MixedBilinearForm and associated integrators (Cruz, 2022, Cruz, 2021).
Discontinuous Galerkin ( $L^2$ ): Interior penalty, upwind, and DPG methods via upwind and trace integrators.
Isogeometric (NURBS): NURBS_FECollection supports high-regularity spline bases for geometric and solution fields (Andrej et al., 2024).
Hybrid, mortaring, and DPG: Addition of mortar spaces and hybridization for domain decomposition and DPG-style methods.
Tensor-product and simplex elements: Both structured/quadrilateral/hexahedral and unstructured/triangular/tetrahedral meshes are natively supported.

A typical workflow proceeds: mesh acquisition and refinement $\to$ FiniteElementCollection and FiniteElementSpace selection $\to$ variational problem assembly via BilinearForm/LinearForm/MixedBilinearForm $\to$ system elimination, solution, and post-processing (Anderson et al., 2019, Andrej et al., 2024, Cruz, 2022, Cruz, 2021).

3. Assembly Strategies and Matrix-Free Technologies

MFEM’s assembly strategies are explicitly designed for performance portability and memory efficiency:

Full (sparse) assembly: Stores the global matrix in CSR format; widely used for problems that fit in memory or for compatibility with algebraic multigrid solvers (e.g., Hypre BoomerAMG).
Partial/matrix-free assembly: At high polynomial order or extreme scale, MFEM employs sum-factorization and partial assembly, storing only the element-level quadrature data $D$ and on-the-fly computing basis actions $B$ , $B^T$ . The finite element operator is applied as $A = P^T G^T B^T D B G P$ , where $P$ is the parallel restriction/prolongation, $G$ is element restriction, and $B$ , $D$ are basis and quadrature data, respectively (Andrej et al., 2024, Vargas et al., 2021).
Element-by-element (EBE): For massive models, explicit local element blocks are avoided entirely. MFEM’s operator products are implemented through direct action on local vectors, exploiting tensor-product structure for $O(p^{d+1})$ per-element complexity versus $O(p^{2d})$ of full assembly (Vargas et al., 2021).

MFEM’s matrix-free infrastructure extends to multilevel preconditioners, such as voxel-structured multigrid for biomechanical simulation (Martinez-Weissberg et al., 30 Dec 2025) and low-order-refined (LOR) AMG for $H^1$ , $H(\mathrm{curl})$ , and $H(\mathrm{div})$ spaces (Andrej et al., 2024).

Comparison of assembly and solver regimes in application demonstrates that EBE reduces memory by $3$– $4\times$ at the expense of iterative solver count, making simulation of $>8\times10^8$ DOFs feasible on moderate, commodity HPC clusters (Martinez-Weissberg et al., 30 Dec 2025).

4. High-Performance Parallelism and Accelerator Portability

MFEM’s parallel model is built for strong and weak scaling on leadership-class HPC resources:

Domain decomposition: Mesh partitioning and ghost layer management are handled in ParMesh, with element and DOF ownership tables to support parallel assembly and shared boundaries (Martinez-Weissberg et al., 30 Dec 2025, Andrej et al., 2024).
Distributed memory (MPI): ParFiniteElementSpace and ParGridFunction distribute DOFs; HypreParMatrix and operator wrappers enable algebraic solvers in parallel.
Threading and GPU backends: Assembly and operator action loops transparently offload to OpenMP, CUDA, HIP, or SYCL backends; device selection and memory synchronization are managed via mfem::Device and seamless data transfer routines (Andrej et al., 2024, Vargas et al., 2021).

MFEM’s integration with the CEED ecosystem (libCEED, RAJA, Umpire) provides BLAS-like kernel optimization, memory management, and hierarchical shared-memory kernel launching, yielding CEED Bake-Off Problem-level peak performance on NVIDIA V100 and AMD MI250X GPUs (Vargas et al., 2021, Andrej et al., 2024). Kernel fusions and advanced backend switching enable order-of-magnitude improvements in throughput and strong scalability (e.g., 5,000 MDOFs/s on MI250X; robust scaling to $>2,000$ nodes/GPU ranks).

5. Algorithmic Examples and Application Benchmarks

MFEM serves as the discretization and solver engine for a broad suite of scientific and engineering codes:

Biomechanical μFE modeling: Large-scale, high-resolution voxel-based simulations of bone mechanics, solving linear elasticity problems with up to $8\times10^8$ DOFs, verified against both commercial solvers and experimental DIC data (Martinez-Weissberg et al., 30 Dec 2025).
Compressible and incompressible flow: Codes such as Laghos (Lagrangian hydrodynamics) and Hydra utilize MFEM’s high-order operator infrastructure (Andrej et al., 2024, Anderson et al., 2019).
Electromagnetics and plasma: Petra-M, Palace, and fusion applications leverage MFEM’s $H(\mathrm{curl})$ and $H(\mathrm{div})$ spaces with mixed-hybrid discretizations.
Topology optimization, mesh optimization (TMOP), and multiscale methods: GPU-accelerated TMOP achieves $30\text{–}50\times$ single-GPU speedup for $p=1..4$ ; MFEM also underlies large-scale topology optimization workflows (Andrej et al., 2024, Anderson et al., 2019).
Navier–Stokes workflows: The NavierSolver mini-app demonstrates the coupling of mass/momentum blocks, $hp$ -refinement, and solver composition for both steady and unsteady 2D/3D turbulent flows (Cruz, 2022).
Saddle-point and mixed-variational problems: Implementation of block preconditioners, MINRES solvers, and the full suite of auxiliary block operator and solver classes, with detailed usage for both primal and mixed Laplace forms (Cruz, 2021).

Typical C++ code for MFEM usage follows a predictable and succinct pattern: mesh construction, finite element space definition, operator assembly, preconditioning and solution, with emphasis on minimizing code required to transition from serial to fully parallel or device-hosted execution (Andrej et al., 2024, Anderson et al., 2019, Martinez-Weissberg et al., 30 Dec 2025).

6. Validation, Performance Metrics, and Best Practices

Best practices in the deployment and verification of MFEM-based simulations are well established:

Verification: Rigorous comparison with commercial codes such as Abaqus, slope and $R^2$ regression of displacement and strain fields, and convergence checks across mesh resolutions and segmentation parameters (Martinez-Weissberg et al., 30 Dec 2025).
Performance tuning: Assembly level (full vs. partial), multigrid depth ( $J\approx\log_2\,$ min mesh dimension), Chebyshev smoother order, and METIS partitioning for mesh balance are key levers (Martinez-Weissberg et al., 30 Dec 2025). Variable order ( $p$ ) and mesh refinement ( $h$ ) are flexibly handled.
Preconditioning: For assembled systems, Hypre BoomerAMG is recommended; for matrix-free problems, custom multigrid built atop MFEM’s operator abstraction; saddle-point problems use block or Schur complement preconditioning (Cruz, 2022, Andrej et al., 2024).
Post-processing: Efficient extraction of boundary fields via SubMesh and ParGridFunction::GetSubVector, use of Gaussian smoothing to compare strain fields across resolutions or experimental validation (Martinez-Weissberg et al., 30 Dec 2025).

Performance metrics from large-scale models illustrate MFEM’s capabilities:

Model	Assembly	Memory	Solve Time	CG Iterations
L20 (811M DOFs)	Full	2.8 TB	245 min	840
L20	Element-by-element	0.7 TB	399 min	2218
L40 (133M DOFs)	Full	499 GB	11.4 min	302
L40	Element-by-element	159 GB	17.5 min	709

Partial assembly and matrix-free techniques routinely reduce problem memory requirements by $68-75\%$ , making feasible the simulation of anatomical-scale structures and multiscale domains (Martinez-Weissberg et al., 30 Dec 2025).

7. Impact and Future Directions

MFEM is the backbone of numerous exascale and emerging research codes across DOE, academia, and industry. Its close integration with CEED, continual contributions from the MARBL and Laghos projects, and ongoing API and kernel optimizations underpin its sustained relevance (Andrej et al., 2024, Vargas et al., 2021). Extensions to non-conforming refinements, mixed-mesh and p-adaptivity, kernel fusion and strong scaling improvements, and advanced matrix-free preconditioners constitute active areas of development.

MFEM’s design enables straightforward use in both rapid prototyping (for new discretizations and numerical algorithms) and highly optimized production codes, with scalable performance benchmarks demonstrated up to and including exascale-class GPU architectures (Andrej et al., 2024, Vargas et al., 2021, Anderson et al., 2019).

References:

(Martinez-Weissberg et al., 30 Dec 2025, Andrej et al., 2024, Vargas et al., 2021, Cruz, 2021, Anderson et al., 2019, Cruz, 2022)