- The paper introduces GDlog, a novel GPU-based Datalog engine that leverages a Hash-Indexed Sorted Array for efficient parallel deductive analytics.
- It employs innovative join processing and lock-free deduplication techniques to achieve 5-10x speedups over state-of-the-art CPU-based engines.
- Evaluation on multiple data center GPUs confirms robust performance and scalability, paving the way for enhanced program analysis and real-time data processing.
Modern Datalog on the GPU
Introduction
The paper "Modern Datalog on the GPU" introduces GDlog, an innovative Datalog engine designed to leverage the computational power of GPUs for high-performance deductive analytics. Datalog is a declarative logic programming language, prominent in applications such as static program analysis, network monitoring, and business analytics due to its support for recursive queries. Existing CPU-based Datalog engines often face scalability challenges, primarily due to limited memory bandwidth and synchronization overhead in shared-memory environments. GDlog addresses these limitations by harnessing the parallel processing capabilities of GPUs.
Methodology
The paper presents an implementation of a modern in-memory Datalog engine on data center GPUs, achieving significant performance improvements. GDlog employs a SIMD API for iterated relational algebra kernels over a novel data structure called the Hash-Indexed Sorted Array (HISA). HISA combines the efficiency of range-indexed joins with dense data structure operations, enabling fast parallel computation. The introduction of HISA is a critical development, as it supports lock-free deduplication and efficient parallel insertion, essential for optimizing Datalog evaluation on GPU architectures.
Evaluation
The evaluation demonstrates GDlog's competitive performance across various data center GPUs, including H100, A100, MI250, and MI50. GDlog consistently outperforms existing CPU and GPU-based systems, achieving speedups of 5 to 10 times over Soufflé, a state-of-the-art CPU-based Datalog engine, particularly in tasks such as context-sensitive points-to analysis. GDlog's implementation handles join planning through Soufflé output translation, ensuring robust range execution.
The paper's experimental results highlight substantial improvements in runtime and memory footprint. For instance, cuDF and GPUJoin encounter out-of-memory errors on multiple datasets, while GDlog executes efficiently across all test cases. This capability is largely attributed to GDlog's focus on minimizing memory usage and maximizing computational throughput.
Contributions
Key contributions of the paper include the following:
- Hash-Indexed Sorted Array (HISA): A novel data structure that optimizes range queries and minimizes memory footprint, specifically designed for GPU architectures.
- GDlog Library: A CUDA-based library facilitating high-throughput deductive analytics on GPUs, leveraging HISA for tuple representation.
- Optimization Techniques: Novel strategies for join processing, including eager buffer management and temporarily-materialized k-ary joins, significantly reducing execution time.
Implications and Future Work
The development of GDlog has practical implications for various domains, notably in program analysis, graph mining, and business intelligence, where high-volume data processing is crucial. The performance gains achieved by GDlog suggest that GPUs can play a transformative role in advancing computational capabilities for Datalog applications.
The paper sets the stage for future research into extending GDlog's capabilities, including support for multi-GPU environments, which could further enhance scalability for large-scale analytics tasks. Additionally, expanding the system to handle incremental computations over data streams could unlock new opportunities in real-time data analysis.
Conclusion
"Modern Datalog on the GPU" represents a significant advancement in the field of high-performance computing for logic programming. By effectively utilizing GPU architectures, GDlog not only addresses the limitations of existing CPU-based engines but also sets a new benchmark for performance, paving the way for more efficient and scalable data-intensive applications.