A Data-Informed Local Subspaces Method for Error-Bounded Lossy Compression of Large-Scale Scientific Datasets
Abstract: The growing volume of scientific simulation data presents a significant challenge for storage and transfer. Error-bounded lossy compression has emerged as a critical solution for mitigating these challenges, providing a means to reduce data size while ensuring that reconstructed data remains valid for scientific analysis. In this paper, we present a data-driven scientific data compressor, called Discontinuous Data-informed Local Subspaces (Discontinuous DLS), to improve compression-to-error ratios over data-agnostic compressors. This error-bounded compressor leverages localized spatial and temporal subspaces, informed by the underlying data structure, to enhance compression efficiency and preserve key features. The presented technique is flexible and applicable to a wide range of scientific data, including fluid dynamics, environmental simulations, and other high-dimensional, time-dependent datasets. We describe the core principles of the method and demonstrate its ability to significantly reduce storage requirements without compromising critical data fidelity. The technique is implemented in a distributed computing environment using MPI, and its performance is evaluated against state-of-the-art error-bounded compression methods in terms of compression ratio and reconstruction accuracy. This study highlights discontinuous DLS as a promising approach for large-scale scientific data compression in high-performance computing environments, providing a robust solution for managing the growing data demands of modern scientific simulations.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.