IBM Storage Scale (GPFS)

Updated 16 January 2026

IBM Storage Scale (GPFS) is a distributed, POSIX-compliant parallel file system designed for high-performance, multi-tiered storage environments.
It integrates diverse hardware from RAID arrays to tape libraries into a unified namespace with automated lifecycle and tiered placement policies.
It delivers robust performance through features like synchronous mirroring, live hardware reconfiguration, and scalable metadata management for large deployments.

IBM Storage Scale (GPFS), formally known as IBM’s General Parallel File System, is a distributed, POSIX-compliant parallel file system architecture designed for large-scale, high-performance storage environments. It enables aggregation and flexible management of diverse storage hardware, including RAID, DAS, SAN, and tape, into a unified, scalable namespace. GPFS underpins multi-petabyte research infrastructures such as the Target project, offering robust support for mixed I/O workloads, advanced tiered storage configuration, automated information lifecycle management (ILM), and high concurrency across thousands of compute and database nodes (Belikov et al., 2011).

1. Architecture and Topology

GPFS is typically deployed as a multi-site, multi-tier cluster spanning geographically dispersed data centers. In the Target infrastructure, it comprises two failure groups (Data Centre A and Data Centre B), interconnected via a redundant 40 Gb/s Ethernet ring composed of dual 10 GbE links and fiber trunks. This ring supports both GPFS cluster traffic and Oracle RAC interconnects. Key architectural components include:

GPFS metadata and client servers (e.g., IBM x3650) distributed across both sites.
Storage tiers:
- Tier A: SAS RAID for databases (mirrored).
- Tier B: SATA DAS/RAID for large sequential workloads.
- Tier C: Fibre-Channel RAID for high-IOPS, random-access workloads.
- Tier D: LTO tape libraries for nearline archiving.
All pools federated under a single GPFS namespace, accessed via POSIX interfaces by compute, application, and database servers.

The topology supports synchronous mirroring of Tier A LUNs and access balancing across other pools. Critical metadata and data are mirrored for resilience, and federation allows seamless load balancing and access across thousands of Linux clients.

2. Configuration Parameters and ILM Policies

GPFS configuration in the Target system is optimized for multi-petabyte scale, heterogeneous hardware, and workload diversity. Default extent size is set at 4 MiB, with a stripe width of 8 MiB and stripe count of 4. Large files are striping-enabled to distribute load.

Tiered storage pools are defined as follows:

Tier	Hardware	Purpose
Tier A	IBM DS3200 SAS RAID	Oracle DB, hot data (mirrored)
Tier B	IBM DS9900 SATA RAID	LOFAR archives, large sequential I/O
Tier C	IBM DS5300 FC RAID	Small-file AI ingest, high IOPS
Tier D	IBM TS3500 LTO-4 Tape	Nearline archive

Placement policies for incoming files allocate pools based on owner, project, and file size. For example, Oracle data goes to Tier A; LOFAR project files to Tier B; small files (<100 MiB) to Tier C. ILM policies automate age-based migration—e.g., files older than 180 days in Tier B are migrated to tape (Tier D), and infrequently accessed files in Tier C are archived.

Redundancy is achieved via synchronous mirroring (zero-RPO) for Tier A and array-level RAID plus GPFS placement across multiple disks for Tiers B and C. Cross-site replication may be enabled at the project level.

3. Hardware Characteristics and Data Placement Optimization

Target’s initial GPFS hardware layout included:

Tier	Model	Media	Connectivity	Raw Capacity
Tier A	IBM DS3200	SAS, 2 TB x 28	SAS to hosts	56 TB
Tier B	IBM DS9900	SATA, 1 TB x 600	8 Gb FC SAN	600 TB
Tier C	IBM DS5300	FC, 600 GB x 224	8 Gb FC SAN	134.4 TB
Tier D	IBM TS3500 Lib	LTO-4 tapes	8 Gb FC SAN	900 TB (tape)

Dedicated dual-redundant controllers and multipath networks support high bandwidth and resilient operation. Files are striped across disks to balance high-throughput sequential operations and high-IOPS random-access patterns. Metadata I/O is preferentially directed to Tier C during intensive lookup workloads, while large streaming workloads are handled primarily by Tier B.

4. Scalability Strategies

Adding storage or compute capacity to GPFS involves online procedures: racking disks and servers, defining new storage pools, assigning new LUNs, and issuing GPFS volume addition commands. Optional rebalancing distributes data to new spindles. The metadata model is decentralized, placing inodes across metadata-defining disks and scaling with hardware additions. Namespace limits are 8 Exabytes, with >1 billion files per filesystem and up to 64 million files per directory; Target workloads have peaked at 100 million files.

Recovery and rebalancing scale linearly with disk count per pool and do not require interruption of service.

5. Performance Metrics and Modeling

Benchmarking within Target exhibits distinctive performance behaviors:

Small-file random-access (Monk AI ingest benchmark, 67,500 files): Best IOPS (≈7,000) and lowest latency (9 ms) when both metadata and data reside on Tier C. Striping across Tier B/C approaches these metrics at reduced cost. Block size (1–8 MiB) impacts IOPS by ≤5% for small files.
Large sequential streaming (LOFAR): Sustained throughput of ≈4.8 GB/s read and 4.4 GB/s write on a 25 TB dataset across 120 Tier B spindles; latency to first byte ≈2 ms.

GPFS throughput is modeled as:

$T_{seq} \simeq \min(N, S) \times B$

where $N$ is disk count, $S$ is stripe count, $B$ is per-disk bandwidth.

Aggregate IOPS for concurrent random-access clients:

$IOPS_{agg} \simeq C \times \frac{1}{\alpha}$

where $C$ is client count, $\alpha$ is average per-disk seek latency.

6. Maintenance, Lifecycle Management, and Online Reconfiguration

GPFS supports live addition/removal of hardware: new storage appears instantly in the namespace once zoned; retirement involves draining pools, migrating data (via gpfs migrateFiles), and pool removal. Policy engines process ILM and placement rules (SQL-style), and policy sweeps can be throttled to avoid network congestion during large migrations.

Online rebalancing and node failure handling are performed in background; data is redistributed and replica blocks are remapped automatically. Synchronization and recovery mechanisms do not disrupt filesystem availability.

7. Best Practices and Operational Lessons

Operational experience in Target demonstrates several best practices:

Define tiering for distinct workload classes (hot, warm, cold, archive) prior to deployment.
Mirror metadata pools across failure groups for zero-RPO.
Co-locate metadata and small-file data on the highest-IOPS tier for optimal performance.
Deploy ILM policies from inception to avoid manual migrations at petabyte scale.
Design network fabric for non-blocking, multi-path resilience; single points of failure severely impact performance and availability.
Retain GPFS block size defaults for mixed workloads.
Validate directory tree and inode scaling under production loads to preempt performance degradation from inode density.

GPFS in Target has proven to deliver over 1 PB of online disk and 10 PB of tape-backed storage, up to 4 GB/s throughput for LOFAR pipelines, >7,000 IOPS for AI workloads, and maintained sub-10 ms random-read latencies. Continual expansion and reconfiguration are supported seamlessly as storage technologies evolve and requirements scale (Belikov et al., 2011).

Markdown Report Issue Upgrade to Chat

References (1)

Information Systems Playground - The Target Infrastructure, Scaling Astro-WISE into the Petabyte range (2011)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to IBM Storage Scale (GPFS).