Parallel Semantics PDG (PS-PDG)
- Parallel Semantics PDG is a framework that extends traditional PDGs by incorporating hierarchical nodes and explicit node traits to accurately model parallel constructs.
- It introduces mechanisms for annotated data-selector semantics on edges and supports parallel semantic variables to handle reductions and privatization.
- Empirical evaluations demonstrate that PS-PDG expands scheduling options and reduces critical path lengths by up to twofold, improving parallel compiler optimization.
The Parallel Semantics Program Dependence Graph (PS-PDG) is a formal extension of the Program Dependence Graph (PDG) designed to model, analyze, and optimize explicitly parallel intermediate representations (IRs) in modern compilers. While classical PDGs capture the minimal set of control and data constraints needed to guarantee semantic-equivalence in sequential programs, the PS-PDG introduces additional mechanisms to handle the semantics of parallel constructs, including fork–join, task and loop parallelism, reductions, atomics, and scoped or hierarchical contexts. This decouples the description of semantic constraints from the parallel execution plan, enabling compilers to generate a broader set of valid parallel schedules while strictly preserving program correctness (Homerding et al., 2024).
1. Formal Definition
The PS-PDG is formally defined as a 4-tuple
where:
- : Set of nodes. Each node is either a Plain node (grouping IR instructions) or a Hierarchical node (grouping child nodes), annotated with traits , where is the set of context labels.
- : Set of edges partitioned into:
- Directed edges (), where specifies the data-instance relationship between producer and consumer.
- Undirected edges (), enforcing mutual exclusion but not relative ordering.
- : Set of parallel semantic variables. Each variable is annotated as privatizable (with possible reduction function) or reducible (user-supplied binary operator) in a given context.
- : Use/def relations connecting variables and nodes.
Context labels index different semantic scopes (e.g., particular loops or paralell regions), allowing properties and constraints to be localized.
2. Motivating Extensions and Comparison with Sequential PDG
The PS-PDG systematically extends the classical PDG to support explicit parallelism. The chief innovations are as follows:
- Hierarchical Nodes: Allow formation of higher-level regions (such as OpenMP critical or task blocks), supporting region-level semantics.
- Node Traits: Annotate nodes with atomic, orderless, or singular properties. This encodes, for instance, atomicity (critical/atomic regions), unordered execution (parallel sections), or single-execution constraints (OpenMP single).
- Context Sensitivity: Enables dependence and constraints to be local within specific regions/loops, not globally.
- Undirected Mutual Exclusion Edges: Capture “may not run in parallel” without requiring a happens-before order.
- Data-Selector Semantics on Edges: Allow specification whether any, last, or all producer instances may provide data to a consumer (enabling accurate modeling of reductions and lastprivate).
- Parallel Semantic Variables and Use/Def Annotation: Annotate variables as privatizable or reducible, supporting explicit modeling of threadprivate state and reductions.
The following table (exact content from (Homerding et al., 2024)) summarizes the differences:
| Feature | PDG | PS-PDG |
|---|---|---|
| Node Granularity | single inst. | Inst‐set / region (HN) |
| Atomic/Critical | N/A | trait |
| Loop-carried Indep. | N/A | trait w/ context |
| Single-execute | N/A | trait |
| Context sensitivity | global | per-region contexts |
| Must-precede edges | only | with and undirected |
| Reductions | unsupported | : reducible(var, f) |
| Privatization | unsupported | : privatizable(var) |
These extensions collectively allow the PS-PDG to precisely express the legal set of parallelizations admitted by modern IRs—functionality out of reach for the classical PDG (Homerding et al., 2024).
3. Semantic Invariants and Scheduling Correctness
A schedule derived from a PS-PDG must observe several invariants:
- Directed-Edge Ordering: For any directed edge , every dynamic instance of must wait for some producer (selected by ) in context to complete.
- Undirected-Edge Mutual Exclusion: For any undirected edge , no two instances may overlap in execution under context .
- Atomicity: If node is atomic in context , all dynamic instances of under execute without interleaving.
- Singularity: If singular, at most one dynamic instance exists in context .
- Reduction Correctness: For reducible variables, the reduction operation must match sequential semantics.
- Privatization Cleanup: Privatized variables must either be properly reduced or discarded at parallel region boundaries.
These invariants define the legal set of parallel execution plans (schedules) that preserve the original semantics (Homerding et al., 2024).
4. Example: OpenMP Parallel Loop with Reduction and Critical Section
Consider the OpenMP kernel:
1 2 3 4 5 |
#pragma omp parallel for reduction(+:sum) for(int i=0; i<N; i++){ #pragma omp critical sum += A[i]; } |
In the PS-PDG, this structure is represented as:
- Nodes: (the for loop), (the addition, atomic in ).
- Directed edge: .
- Undirected edge: .
- Variable: sum, reducible with in ; and edges to .
This encoding grants the compiler more freedom than a classic PDG, where a data-dependence would enforce strict serialization. The PS-PDG enables:
- Tree-reduction or fan-in parallelism.
- Non-sequential scheduling of additions (due to , no producer ordering enforced).
- Possible elimination of the critical section by using a software reduction.
This illustrates PS-PDG’s ability to encode parallel semantics compactly while enlarging the set of valid schedules (Homerding et al., 2024).
5. Quantitative Evaluation and Optimization Benefits
Empirical evaluation, implemented as an extension to the NOELLE LLVM-based auto-parallelizer, demonstrates substantial benefits:
- Exploration of Parallelization Options: PS-PDG increases the number of legal parallelization plans—on NAS C-benchmarks, PDG alone sees 12.3 options/loop, Jensen et al. workshare analysis raises this to 18.7, while PS-PDG enables 43.5 (over 3× increase).
- Reductions in Critical Path Length: On an ideal unbounded-core CPU, PS-PDG reduces critical path by a factor 1.82× on average (up to 2.1×) compared to the programmer-supplied plan, whereas the classic PDG gives only 1.14×. See the following table:
| Benchmark | # Options (PDG) | # Options (PS-PDG) | CritPath Speedup (PDG) | CritPath Speedup (PS-PDG) |
|---|---|---|---|---|
| CG | 8 | 28 | 1.12× | 2.05× |
| IS | 14 | 55 | 1.08× | 1.85× |
| FT | 10 | 35 | 1.20× | 1.65× |
| MG | 7 | 21 | 1.05× | 1.50× |
| SP | 5 | 18 | 1.10× | 1.40× |
| Average | 12.3 | 43.5 | 1.14× | 1.82× |
PS-PDG thus substantially enlarges the legal scheduling space and enables significantly shorter critical paths while maintaining strict semantic equivalence (Homerding et al., 2024).
6. Significance and Implications
PS-PDG addresses the primary deficiency of existing parallel IRs and PDG-based optimization frameworks: the inability to explicitly capture the minimum necessary constraints on parallel execution for semantic equivalence. By precisely expressing not only control and data dependences but also fine-grained parallel traits and context-sensitive constraints, PS-PDG forms a foundational tool for parallelizing compilers.
A plausible implication is the facilitation of advanced optimization strategies—such as aggressive reduction tree scheduling, elimination or transformation of atomic/critical sections, and safe exploitation of orderless or singular regions—all with provable semantic preservation.
PS-PDG’s explicit separation of semantic constraints (“what must happen”) from the concrete parallel plan (“how it happens”) gives compilers broad latitude to retarget code to complex heterogeneous architectures or adapt to evolving parallel programming constructs, while providing strong correctness guarantees (Homerding et al., 2024).
7. Conclusion
The Parallel Semantics Program Dependence Graph (PS-PDG) generalizes the classical PDG to fully encompass the requirements of explicitly parallel IRs, introducing hierarchical nodes, fine-grained node traits, context-sensitive constraints, advanced edge semantics, and explicit support for reductions and privatization. By fundamentally extending the representational power of dependence graphs, the PS-PDG enables compilers to robustly optimize and schedule parallel programs, substantially expanding the space of provably correct schedules and enabling reductions in critical path length by up to a factor of two compared to classic approaches—all while strictly maintaining the original parallel semantics (Homerding et al., 2024).