Time Series Graph Data File (TGF)
- TGF is a DFS-native multi-version file format designed to store and traverse billion-edge time series graphs, supporting complete per-update histories and subsecond time-sliced reads.
- It employs a star-centric, columnar layout with compression-aware engineering and streaming I/O to achieve high-throughput queries, simulations, and historical snapshot recovery.
- TGF integrates lightweight range and Bloom indexes, fine-grained partitioning, and efficient vertex mapping to minimize I/O overhead and optimize query performance.
The Time Series Graph Data File (TGF) is a distributed file system (DFS)-native, multi-version file format at the core of SharkGraph, a time series graph system designed for the efficient storage, retrieval, and traversal of graphs at billion- and hundred-billion-edge scale. TGF supports a complete per-update history, enables efficient subsecond time-sliced reads, and ensures that no full-graph materialization in memory is required. Through a combination of star-centric data layout, compression-aware engineering, and streaming I/O, TGF enables high-throughput batch graph queries, simulations, data mining, clustering algorithms, and exact time traversal or snapshot recovery, even on industry-sized graphs spanning hundreds of billions of edges (Tang, 2023).
1. High-Level Role and Design Objectives
TGF addresses the core challenge of processing temporal graphs not only in their current state but also supporting analytics over any historical snapshot identified by an arbitrary timestamp . The solution is based on partitioning all edge and vertex updates by , making time a first-class partitioning key. Edges are grouped into "star blocks"—all edges sharing one source vertex—so that even high-degree vertices correspond to single units of I/O. All attributes, including timestamps and user fields, are stored in a columnar layout per edge or vertex attribute, with per-column compression optimized for time-order sorted data.
To facilitate fast queries and minimize superfluous I/O, each file header exposes a lightweight range index and, optionally, probabilistic (Bloom) indexes. The system streams only the minimal required blocks for a query, never loading non-essential portions into memory. By combining these storage and indexing techniques, TGF enables SharkGraph to efficiently "replay" or "rewind" historical graph states with per-query I/O costs scaling in proportion to the active subgraph (Tang, 2023).
2. On-Disk Layout and Partitioning
Each DFS partition (e.g., HDFS://graphId/dt=YYYY-MM-DD/hour=HH/edgeType=FOLLOWS/) contains two main file collections per time- and key-partition: edge files and vertex files.
Edge Files:
- Star-structure file: Encoded in protobuf or a custom format, the structure file includes a header (range index and optional Bloom index) and multiple blocks. Each "star block" enumerates all destination vertex ids for a specific source id , packed into a binary format. Every block groups all out-edges of a source for efficient retrieval.
- Attribute columns: Each edge attribute (timestamp, weight, label, etc.) is stored as a separate column file, aligned in row order to the structure file's leaves. Timestamp columns, for example, consist of an 8-byte base time followed by varint-encoded deltas for compactness.
Vertex Files:
- vid.seq: A strictly ascending 64-bit global id mapping, delta- or DFCM-compressible for space efficiency.
- route.bin: Maps each local vertex id to a (routeType, partitionID) pair, encoded as a 32-bit code to expedite partition lookups during traversal.
- Attribute columns: For each vertex property, a columnar-compressed sequence with timestamps; sorted by local vertex id.
This separation enables parallelism, columnar compression, and efficient block-wise access, while keeping metadata minimal and lookups fast.
3. Indexing Structures and Auxiliary Metadata
TGF leverages a compact block-wise range index and, optionally, per-block Bloom filters residing in each file header. The range index includes, for every block , entries for , typically a few kilobytes even for thousands of blocks. Bloom filters, if present, enable constant-time key presence checks, further narrowing candidate block reads.
Vertex file metadata includes the vid.seq header, which marks the partition's global-to-local id mapping, allowing binary search with delta decompression for efficient lookups. By integrating range filtering and Bloom tests, block reads required for narrow queries—whether by time range or key subset—are minimized, achieving near-zero random I/O outside the query's region of interest (Tang, 2023).
4. Algorithms, Encodings, and Formulas
Key algorithms and data encodings underlying TGF's storage efficiency and traversal performance include:
- Block offset calculation:
- Timestamp delta encoding (varint):
Base timestamp is stored as 8 bytes; each subsequent is encoded as variable-length integers.
- DFCM float/double compression: For each consecutive value pair , store XOR difference as , where lz and tz count leading/trailing zeros, respectively.
- Global-to-local vertex id mapping: Binary search identifies the containing block in vid.seq, followed by delta decompression to recover the local id.
- Vertex route code: Encoded as 32-bit values: , with routeType for {src, dst, both}.
Streaming graph algorithms iterate by reading only relevant star and attribute blocks, using the indexes for selective seeks. State recovery at timestamp involves traversing all relevant blocks for , filtering by time window in the I/O loop. Pseudocode provided in the original source details the tight loop for Pregel-style parallel traversal, demonstrating minimal in-memory materialization (Tang, 2023).
5. Performance, Compression, and Partitioning Trade-Offs
Experimentally, TGF + ZSTD achieves 30% smaller on-disk graphs compared to uncompressed Parquet or GraphX. Block size (default 16 MB, tunable 8–64 MB) balances random access latency (favoring small blocks) and sequential I/O/compression efficiency (favoring large blocks). The compression pipeline combines varint (ints), DFCM (floats), and dictionary (strings) encodings with ZSTD, yielding a space reduction at <30% CPU overhead, outperforming alternatives like Snappy on large datasets.
Global-to-local id remapping halves id storage (8 B to 4 B), saving 20–30% of space. Fine-grained 3D partitioning could, in theory, scatter edges for hot key pairs, but for partitions, empirical results show cross-partition overhead even on skewed graphs.
Practical benchmarks report that a batch 3-hop traversal on a 200 billion-edge social graph executes three times faster than Apache Spark GraphX, with 30% less memory used. Time traversal and historical snapshot recovery cost approximately a standard single-version traversal due to optimized indexing (Tang, 2023).
6. Reimplementation and Engineering Considerations
A minimal reimplementation of TGF requires the following layered techniques:
- Star-centric, columnar file layout: Block all edges per source ("star"), each with matched attribute columns.
- Per-block range and Bloom indexes: Implemented in headers for sub-second selective reads.
- Compression pipeline: Preprocess columns by data type, then use general-purpose compression such as ZSTD.
- route.bin mapping: Connect local vertex ids to relevant edge files, supporting partitioned traversal.
- Streaming I/O engine: Hot loop filters star and timestamp indices during traversal, emitting only next-needed vertices.
All operations outside core traversal and compute resolve to reading, decompressing, filtering, and emitting relevant graph fragments. This layered architecture sustains scalability to hundreds of billions of edges without the need to ever fully materialize the graph in memory (Tang, 2023).
Reference: (Tang, 2023) "SharkGraph: A Time Series Distributed Graph System".