PS-PDG: Parallel Semantics Program Dependence Graph
- PS-PDG is a formal intermediate representation for parallel programs, precisely modeling semantic constraints for correct parallel execution.
- It extends classical PDGs by incorporating hierarchical nodes, directed and undirected edges with data selectors to capture parallel constructs.
- The construction algorithm encodes OpenMP/Cilk features such as variable privatization and reduction, revealing broader transformation opportunities.
The Parallel Semantics Program Dependence Graph (PS-PDG) is a formal intermediate representation for parallel programs that models the precise set of constraints necessary to guarantee semantic equivalence among all correct parallel execution plans. Designed to address limitations of classical Program Dependence Graphs (PDGs)—which target sequential code—the PS-PDG systematically encodes the rich semantics of parallel programming constructs, enabling optimizations not feasible within the traditional framework. By explicitly representing parallel regions, orderless interactions, and first-class parallel variables, the PS-PDG both delineates and expands the space of legal program transformations available to parallelizing and vectorizing compilers (Homerding et al., 2024).
1. Formal Definition and Structure
The PS-PDG is defined as the 4-tuple
with the following elements:
- : The set of nodes, including instruction nodes (, one per static instruction) and hierarchical nodes (, each representing a contiguous region such as a loop, task, or critical section).
- : The set of directed edges, each representing a data, control, or parallel dependence tagged with a data-selector ().
- : The set of undirected edges encoding orderless mutual-exclusion constraints, each within a context .
- : A set of parallel-semantic variables (e.g., privatizable, reducible), each possibly carrying an update function if reducible.
- : Relations mapping each parallel variable to its defining and using instruction nodes.
The semantics of each edge or node depend on one or more contexts (parallel regions such as a specific loop or task) and traits (Atomic, Singular, Unordered), controlling allowed dynamic interactions.
Directed edges have an associated data-selector , such that, for example, an edge enforces that sees only the most recent (last) dynamic definition from within context . Undirected edges in constrain dynamic executions not to overlap but permit any ordering.
2. Extensions over the Classical Program Dependence Graph
The PS-PDG generalizes the PDG, which is typically structured as , capturing only pairwise read-write and control dependences. The key innovations of PS-PDG include:
- Hierarchical nodes (): Model bigger regions (e.g., loops, tasks, critical sections), allowing region-level attributes (Atomic, Singular, Unordered).
- Node traits: Label nodes to encode semantic constraints needed in parallel semantics, such as "atomicity" (mutual exclusion), "singularity" (single execution per context), and "unordered" (no total order requirement).
- Contexts (): Each dependence or trait can be qualified within one or more enclosing parallel regions, ensuring precise scoping.
- Undirected edges (): Represent mutual exclusion or orderless parallel constructs that have no prescribed execution order but cannot overlap.
- Data-selectors in : Support for AnyProducer, LastProducer, and AllConsumers semantics allows encoding of non-sequential live-out behavior (e.g., privatization, reduction).
- First-class parallel variables (, ): Explicitly represent privatizable and reducible variables, associating usage/definition relationships and update functions for reductions.
These extensions are necessary and sufficient to capture the semantics of OpenMP/Cilk constructs, going beyond the sequential semantics of the PDG.
3. Construction Algorithm Overview
Given a parallel Intermediate Representation (IR) with OpenMP or TAPIR annotations, the construction of the PS-PDG proceeds as follows:
- Initialize empty sets for nodes, edges, variables, and variable associations: , , , , .
- For each static instruction, instantiate an instruction node and add to .
- For each parallel region , collect all instruction nodes it contains, create a hierarchical node enclosing these, attach region-level traits as appropriate (e.g., Atomic for critical, Singular for single), and insert self-edges in to enforce atomicity or singularity.
- For each pair of instructions and memory location, insert directed dependence edges to as needed, using the correct data-selector and context according to variable semantics (e.g., firstprivate, reduction).
- For each reduction or privatizable variable, create a corresponding entry in , record its defining and using instructions in along with the relevant reduction function for reducibles.
- Return the completed PS-PDG.
A topological schedule that respects all constraints according to their data-selectors, avoids overlap on edges, and enforces atomicity and singularity traits guarantees full semantic equivalence with the original parallel program.
4. Illustrative Example: OpenMP Loop with Reduction and Critical Section
Consider the following OpenMP annotated code:
1 2 3 4 5 6 7 |
#pragma omp parallel for private(a) reduction(+:sum) for(int i=0; i<N; ++i){ a = …; #pragma omp critical { buf[k[i]] += a; } sum += compute(i); } |
- Nodes represent the loop, body instructions, and the critical section (modeled as a hierarchical node with Atomic trait and undirected self-edge to prohibit overlap).
- Directed dependence edges include:
- for , capturing that each critical region reads only the most recent in its iteration.
- as is a reducible variable.
- Variables include (type Reducible, ), and (Privatizable).
- VA edges connect variable uses and defs, ensuring correct privatization and reduction semantics.
A classical PDG, lacking key PS-PDG features, would erroneously force global serialization by treating the reduction and critical regions as strictly sequential, precluding the DOALL schedule permitted by PS-PDG.
5. Compiler Optimization Opportunities and Empirical Impact
Use of the PS-PDG by compilers facilitates a broader and more accurate exploration of parallelization and transformation options:
- Parallelization Plan Exploration: An automatic parallelizer based on PS-PDG (extending NOELLE) enumerates all viable parallelization strategies (DOALL, DSWP, HELIX) for loops with significant profile weight. PS-PDG’s representations expose privatization, reduction, and orderless parallel constructs not detectable with PDG-based analyses.
- Empirical Gains:
- Parallelization-option space: For the NAS C benchmark suite, a PDG pipeline offers transformation choices, whereas the PS-PDG pipeline enables (up to more).
- Critical-path speedup (on an ideal machine): Compared to a programmer’s canonical OpenMP transformation, PDG enables up to a shorter critical path, while PS-PDG achieves up to shorter paths (average ).
These results underscore PS-PDG’s ability to reveal and exploit parallelism latent in the code, unreachable by previous dependency representations.
6. Practical Implementation and Future Directions
Practical considerations and ongoing research paths include:
- Compilation overhead: Construction of hierarchical nodes, region contexts, and VA edges induces a – compile-time overhead relative to PDG construction.
- Metadata management: The front end must maintain and propagate context metadata for regions, which can complicate subsequent IR passes.
- Scalability: Deeply nested or numerous regions can result in large PS-PDGs; unused contexts may need to be garbage-collected to maintain efficiency.
- Potential extensions: Adding support for heterogeneous accelerators (offload contexts), integration with distributed memory models (e.g., MPI + OpenMP hybrids), and richer dynamic dependence models (e.g., symbolic data-selector policies) represent open directions.
Each extension in PS-PDG is both necessary to capture particular parallel-language features (e.g., OpenMP/Cilk semantics) and sufficient to enable compilers to safely enumerate a broad range of legal parallel schedules with substantial runtime performance and transformation opportunities (Homerding et al., 2024).