Dual Structured Workflows
- Dual structured workflows are a two-layered architecture that separates high-level control from fine-grained compute execution.
- They utilize an outer control layer for incident management and an inner compute layer for local task orchestration, reducing overhead.
- Applied in supercomputing, distributed analytics, and stochastic processing, this model enhances scalability and performance under uncertainty.
Dual structured workflows are workflow-composition architectures that unify two distinct, interoperable layers of orchestration—most often to efficiently manage heterogeneous, complex, or time-constrained workloads under uncertainty and system constraints. Characteristic instantiations include external (“control” or marshalling) engines that manage data sources, user interaction, and resource coordination, paired with internal (“compute” or execution) workflows responsible for fine-grained, platform-local task coupling. This structural decoupling enables optimized performance, scalability, and expressivity across a variety of domains, from urgent supercomputing and distributed analytics to stochastic @@@@1@@@@ processing (Brown et al., 2022, Huberman et al., 2015, Ramon-Cortes et al., 2020).
1. Definitional Foundations and Formal Structure
Dual structured workflows refer to composition schemes that partition workflow logic across two cooperating workflow systems or channels, each with clear programmatic or operational roles. In the “outer” layer, a marshalling, scheduling, or user-facing workflow orchestrates high-level incident control, job submission, and error handling. In the “inner” layer, a fine-grained workflow is embedded within compute allocations to manage simulation coupling, preprocessing, postprocessing, and parallelization.
Formally, this pattern is exemplified by the VESTEC + Common Workflow Language (CWL) model in urgent computing (Brown et al., 2022); the outer workflow (with stages) manages incident progression and HPC resource brokering, while the inner workflow (with stages, often CWL-based) handles simulation and data pipeline logic internal to a job allocation. In another paradigm, dual-channel partitioning splits a workload stochastically or deterministically over two processing “channels,” optimizing for metrics such as expected completion time and variance (Huberman et al., 2015).
Hybrid workflow programming models generalize this structure as a bipartite graph of tasks and streams with edges encoding both task-based and dataflow-based dependencies. The structure enables dynamic data exchanges and real-time stream handling interleaved with traditional task orchestration (Ramon-Cortes et al., 2020).
2. Architecture and System Components
A canonical dual structured workflow system comprises:
- Control/Marshalling Layer (Outer System):
Orchestrates global workflow state, user or event inputs, incident management, and high-level resource allocation. In the VESTEC architecture, this includes an External Services API (HTTP/JSON), data source interfaces, a Python-based workflow manager, and simulational/data management backends accessed through REST/RPC (Brown et al., 2022).
- Compute/Execution Layer (Inner System):
Operates within a compute allocation, managing intra-job orchestration. Using CWL, a “skeleton” workflow (.cwl file) is instantiated with one or more scenario/machine-specific YAML files. The CWL runner orchestrates preprocessing, parameter sweeps (scatter/gather), MPI runs, and postprocessing steps completely within the job's lifetime (Brown et al., 2022).
- Interoperability/API Boundary:
Stable, minimal APIs abstracting submission, data staging (put/get), and job-status callbacks (STARTED, COMPLETED, FAILED) separate the two layers, enabling independent evolution and modularity. YAML provides configuration inheritance separating scenario and machine specialization (Brown et al., 2022).
- Hybrid Programming Model Layer:
For hybrid task/dataflow systems, the DistroStream Library (Java, Python) with Apache Kafka or directory-monitor-backed implementations enables streams as first-class parameters, allowing seamless transition between batch and streaming paradigms in a single runtime model (COMPSs) (Ramon-Cortes et al., 2020).
3. Mathematical Models and Performance Analysis
Dual-Workflow Composition (VESTEC + CWL)
Given ensemble members, ensemble members packed per node (scatter width), and a queue submission of nodes:
- Submission time
- Queue latency
- Inner makespan
- Total makespan:
Compared to the single-ensemble-per-job strategy (where ), node packing with dual workflow collapses submission/queue overhead from to (Brown et al., 2022).
Dual-Channel Partitioning Under Uncertainty
For a stochastic workflow partitioned by fraction onto machines with known , the per-channel runtimes are modeled as independent normals. The overall makespan is , with expected completion time and variance:
where is the joint CDF. Optimal is obtained by grid or line search to minimize and/or , typically producing a Pareto frontier (Huberman et al., 2015).
4. Practical Implementations and Application Domains
Dual structured workflows are deployed in:
- Urgent and Interactive Supercomputing:
Used for time-critical scenarios such as space weather predictions, where decoupling incident control (VESTEC) from HPC orchestration (CWL) enables real-time data feed integration, ensemble parallelization, and efficient policy enforcement under batch queue limits. Empirical benchmarking on ARCHER2 demonstrated that MPI+Scatter (dual layering) outperforms monolithic (MPI-only) or micro-batch (scatter-only) strategies, especially for large (Brown et al., 2022).
- Distributed Data Science and Analytics:
COMPSs hybrid workflows (task + stream) enable immediate frame processing in simulations, supporting continuous input/output, as in iterative simulations where each timestep result can be processed as soon as it is generated. Hybrid implementation yielded up to 23% speedup in continuous processing scenarios, and 33% gain in iterative algorithms beyond 32 iterations (Ramon-Cortes et al., 2020).
- Stochastic and Uncertain Process Partitioning:
Dual-channel partitioning for optimization or file transfer applications systematically lowers both mean and variance of completion times, as validated in convex-optimization under CPU contention (18–22% mean, 30–40% variance drop) and parallel file transfer (15–20% mean, 50–60% variance reduction) (Huberman et al., 2015).
5. Interoperability, Challenges, and Best Practices
API, Workflow, and Data Coupling
Interoperability is achieved via:
- RESTful JSON over HTTPS for high-level workflow management.
- RPC/REST for job and data marshalling interfaces.
- YAML/JSON for configuration inheritance and machine/scenario specialization.
- Kafka streams or shared filesystem directories for streaming data movement and first-class parameter passing (Brown et al., 2022, Ramon-Cortes et al., 2020).
Lessons and Best Practices
- Heterogeneous workloads necessitate dual structure to provide both coarse incident-level control and intrajob fine-grained orchestration (Brown et al., 2022).
- Small, stable API boundaries minimize coupling and insulate outer/inner workflow evolution.
- Generic, parametrized workflow definitions (CWL + YAML layers) are preferable to hardcoded machine logic.
- Streaming annotations in hybrid test/dataflow settings avoid artificial barriers and maximize parallelism (Ramon-Cortes et al., 2020).
6. Limitations and Future Directions
- Batch schedulers in HPC enforce job and concurrency limits, which the dual structured approach works around, but persistent queue/coupling idiosyncrasies remain (Brown et al., 2022).
- In streaming models, centralized metadata servers (e.g., DistroStream Server) can become bottlenecks; more scalable, decentralized tracking is an open issue (Ramon-Cortes et al., 2020).
- Current hybrid workflow implementations lack INOUT streaming, advanced partitioning policies, and additional backend support (e.g., for MQTT, Pulsar, or multi-mount file systems) (Ramon-Cortes et al., 2020).
- Semantic integration, such as generating one workflow’s definition from the other, is an emerging requirement for correctness and end-to-end guarantees (Brown et al., 2022).
- The general theory of partitioning can extend to more than two channels and online parameter estimation, with direct application to domains such as job scheduling and network traffic management (Huberman et al., 2015).
7. Comparative Synopsis and Performance Table
| Architecture | Outer Layer | Inner Layer | Key Interface |
|---|---|---|---|
| VESTEC + CWL (Brown et al., 2022) | Incident Control, GUI | HPC CWL Workflow | REST/RPC, YAML |
| Dual Channel (Huberman et al., 2015) | Partition Decision | Parallel Execution | N/A (distribution) |
| Hybrid Workflow (COMPSs) (Ramon-Cortes et al., 2020) | Task & Stream Graph | Streams, Kafka, FS | DistroStream API |
This comparative structure highlights the duality principle: control/coordination is explicitly separated from execution/dataflow, yielding reductions in overhead (from to in supercomputing), improved performance, and robustness to variability and uncertainty across diverse domains.