Papers
Topics
Authors
Recent
Search
2000 character limit reached

Parallel Data Object Creation: Towards Scalable Metadata Management in High-Performance I/O Library

Published 18 Jun 2025 in cs.DC | (2506.15114v1)

Abstract: High-level I/O libraries, such as HDF5 and PnetCDF, are commonly used by large-scale scientific applications to perform I/O tasks in parallel. These I/O libraries store the metadata such as data types and dimensionality along with the raw data in the same files. While these libraries are well-optimized for concurrent access to the raw data, they are designed neither to handle a large number of data objects efficiently nor to create different data objects independently by multiple processes, as they require applications to call data object creation APIs collectively with consistent metadata among all processes. Applications that process data gathered from remote sensors, such as particle collision experiments in high-energy physics, may generate data of different sizes from different sensors and desire to store them as separate data objects. For such applications, the I/O library's requirement on collective data object creation can become very expensive, as the cost of metadata consistency check increases with the metadata volume as well as the number of processes. To address this limitation, using PnetCDF as an experimental platform, we investigate solutions in this paper that abide the netCDF file format, as well as propose a new file header format that enables independent data object creation. The proposed file header consists of two sections, an index table and a list of metadata blocks. The index table contains the reference to the metadata blocks and each block stores metadata of objects that can be created collectively or independently. The new design achieves a scalable performance, cutting data object creation times by up to 582x when running on 4096 MPI processes to create 5,684,800 data objects in parallel. Additionally, the new method reduces the memory footprints, with each process requiring an amount of memory space inversely proportional to the number of processes.

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.