Papers
Topics
Authors
Recent
Search
2000 character limit reached

162-Scenario Benchmark Analysis

Updated 5 February 2026
  • 162-Scenario Benchmark is a systematically constructed suite of 162 workload scenarios that simulate diverse parallelism, communication, and imbalance in distributed programming systems.
  • It employs a Cartesian product of orthogonal parameters to ensure reproducibility and fair comparisons across various runtime paradigms.
  • Key insights include the evaluation of Minimum Effective Task Granularity (METG), baseline overheads, and scalability, which guide performance optimization.

A 162-scenario benchmark refers to a systematically constructed suite of 162 distinct, parameterized workload “scenarios” used for evaluating parallel and distributed programming systems. The canonical realization of this concept is Task Bench (Slaughter et al., 2019), which defines the scenarios as the Cartesian product of orthogonal execution parameters that distill the key characteristics of large-scale applications. The benchmark has been deployed to compare runtime overhead, scalability, and communication/imbalance mitigation capabilities of 15 programming systems on leadership-class supercomputers.

1. Scenario Space Construction and Parameterization

Task Bench’s approach centers on a factorized space of execution time, parallelism, dependency structure, computational intensity, communication pattern, and imbalance. Each parameter has documented discrete or continuous values, generating the scenario universe via their Cartesian product. The core parameters are as follows:

Parameter Values / Range Description
height 1,000 Timesteps per run
width 32 Independent tasks per timestep
dependence pattern stencil, nearest-K, spread-K Communication graph per timestep
ngraphs 1, 4 Number of parallel task graphs
kernel type compute-bound, memory-bound AVX2 integer loop or streaming mem. ops
iterations 27 points (~20 µs–200 ms) Task “grain size”; samples the METG curve
bytes per dependency 16 B, 64 KiB Data transferred along communication edges
imbalance profile balanced, randomized Task work is constant or drawn from U[0,1)

Each concrete scenario is defined by a unique tuple of parameter settings. The 162 scenarios referenced in (Slaughter et al., 2019) correspond specifically to the compute-bound, single-node benchmarking grid with 3 dependency patterns, 2 ngraphs options, and 27 iteration counts: 3 × 2 × 27 = 162.

2. Motivation and Design Principles

Traditional benchmarks—focused on single kernels or end-to-end applications—are insufficient for isolating baseline system overheads or evaluating a runtime’s ability to manage communication, concurrency, and imbalance. The 162-scenario design enables:

  • Separation of concerns: Implementation of workload logic is decoupled from runtime instrumentation; each system (MPI, OpenMP, task-based, workflow) executes the same suite.
  • Parametric stress testing: The multidimensional scenario matrix sweeps through task granularities, synchronization patterns, and resource contention profiles, pinpointing the boundary between efficient and inefficient execution for each runtime.
  • Reproducibility: By fixing parameter values, all systems are subjected to identical, repeatable workloads, facilitating fair cross-system comparisons.

3. Experimental Protocol and System Coverage

The suite is instantiated on hardware such as Cori Haswell nodes (2×16-core Intel Xeon E5-2698 v3, Aries interconnect). Each scenario is run on all 15 programming systems evaluated in the study, which are representative of major runtime paradigms:

  • Message passing: MPI (Cray MPICH), MPI+OpenMP, MPI+CUDA
  • Task-based/HPC: OmpSs, OpenMP 4.0 tasks, PaRSEC (DTD/PTG), Realm/Regent, StarPU
  • PGAS/actors: Chapel, Charm++, X10
  • Workflow/data analytics: Dask, Spark, Swift/T, TensorFlow

For each of the 162 scenarios, a driver emits task-graph and kernel calls, invoking the backend of each programming system. Evaluations include both single-node and multi-node (scaling) studies.

4. Minimum Effective Task Granularity (METG)

A central metric emerging from the 162-scenario methodology is the Minimum Effective Task Granularity (METG). For a given system and scenario, METG quantifies the smallest task duration at which the system attains at least η · 100% of its maximum attainable throughput:

Throughput: P(g)=W/T(g) Peak: Pmax=maxgP(g) Efficiency: E(g)=P(g)/Pmax METG(η)=min{gE(g)η}\begin{aligned} &&\text{Throughput: } &P(g) = W / T(g) \ &&\text{Peak: } &P_{\max} = \max_g P(g) \ &&\text{Efficiency: } &E(g) = P(g) / P_{\max} \ &&\mathrm{METG}(\eta) = \min\{ g \mid E(g) \geq \eta \} \end{aligned}

where g is the average task size in µs. For 50% efficiency, METG(0.5) yields the canonical task-granularity score quoted throughout (Slaughter et al., 2019).

5. Key Quantitative Findings

The exhaustive 162-scenario experiment uncovers significant absolute and relative differences in baseline system costs and scaling characteristics. Salient results:

  • Baseline overheads at 50% efficiency: Ranges widely:
    • Charm++: ~0.4 µs
    • MPI: ~4.6 µs
    • Task-based PaRSEC/PTG, Regent: 1–5 µs
    • Dask/StarPU: 10–100 µs
    • Spark: ≥0.1 s
  • Scalability: As node count increases, METG for MPI grows sublinearly (to ~60 µs at 256 nodes), but analytics frameworks (Spark, Swift/T) degrade rapidly, requiring ~1 ms or more per task at scale.
  • Communication hiding: With large dependencies (64 KiB), task systems that permit asynchronous overlap (Chapel, Charm++, PaRSEC) sustain 50% efficiency at task sizes of ~100 µs, whereas classic MPI is bottlenecked at ~200 µs.
  • Load imbalance: Asynchronous task frameworks (Regent, PaRSEC, Charm++) largely recover performance in randomized/imbalanced scenarios, in contrast to static scheduling frameworks where METG increases sharply.
  • GPU offload cost: MPI+CUDA on NVIDIA P100 achieves METG(50%) ~15 µs, with higher METG when over-decomposing or partitioning GPUs.

These outcomes are summarized in performance tables and METG curves, which clarify that for high-performance kernels, system-level scheduling, message-passing, or analytics frameworks are often bottlenecked by their respective overheads at fine task granularity.

6. Impact and Broader Significance

The use of a 162-scenario benchmark delivers several advances:

  • It provides a universal, scalable substrate for controllably stressing parallel systems across realistic axes of parallelism, contention, and task structure.
  • It enables direct, apples-to-apples comparison of runtime overheads, scalability, and resilience to imbalance and communication, beyond simple application-level benchmarks.
  • METG, as exposed via these scenarios, has become an important comparative figure-of-merit for runtime developers to target when optimizing for fine-grained parallelism.
  • The methodology is widely applicable, with suggestions that even more elaborate scenario spaces (e.g., more dependency types, distributed graph topologies, broader kernel portfolios) may be built by extension.

A plausible implication is that this scenario-based benchmarking approach can be employed in other domains (e.g., AI pipelines, microservices, AIOps) to provide similarly systematic, parameterized coverage of operational properties.

7. Limitations and Further Developments

The 162-scenario construction in Task Bench is specific to a set of parameters designed for compute and memory-bound kernels on distributed shared-nothing architectures. The framework does not directly model application-level dependencies with arbitrary DAGs or encompass domain-specific constraints found in AI, streaming, or real-time workloads. Additionally, some interpreted and data-analytics runtimes (Spark, Swift/T) demonstrated prohibitive overheads for fine-grained tasks, indicating a misalignment with low-latency workloads.

Subsequent works have adapted scenario-based benchmarks for broader workflow types, custom dependency graphs, and domain-specific needs, highlighting the foundational role of enumerative parameterized scenario suites in modern benchmarking (Slaughter et al., 2019).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to 162-Scenario Benchmark.