Data-Compute-Network Co-Orchestration
- Data-Compute-Network Co-Orchestration is a unified approach to managing data, computation, and network resources to achieve near-optimal performance in complex, distributed environments.
- It integrates three core models—data, compute, and network—using optimization techniques such as MILP, LP relaxations, and deep reinforcement learning to ensure efficient resource allocation.
- Applications include exascale scientific workflows, federated edge-cloud microservices, and privacy-aware AI inference, demonstrating significant performance and cost improvements.
Data-Compute-Network Co-Orchestration
Data-compute-network co-orchestration denotes the unified, joint management of data, computational resources, and network fabric in distributed systems. Unlike traditional siloed approaches—where data placement, task scheduling, and network provisioning are independently optimized—co-orchestration explicitly seeks globally optimal or near-optimal outcomes by considering the intertwined constraints and objectives of all three resource classes. This paradigm has emerged in response to the demands of exascale science, distributed AI/ML, edge computing, and real-time, data-intensive applications, all of which exhibit deep coupling between data location, computational placement, and network paths.
1. Core Models and Architectural Paradigms
Modern co-orchestration frameworks abstract the system into three primary models:
- Network Model: Encodes network topology, link bandwidths, latencies, and dynamic link availability. Nodes reason about in-range peers, link-up/down events, and possible communication substrates (wireless mesh, SDN slices, L2/L3 overlays) (Dona et al., 2024, Mauro et al., 2024, Yuan et al., 2022, Taleb et al., 2022).
- Compute Model: Abstracts task descriptions, resource requirements (CPU/GPU/memory), scheduling deadlines, and current resource occupancy. Formal task models range from microservice components in a DAG to containerized pods, RL rollouts, or function chains (Dona et al., 2024, Sofia et al., 19 Jan 2026, Tan et al., 3 Jan 2026).
- Data Model: Captures data types (raw, processed, indexed), location metadata, access latency, freshness requirements, and permissions. In privacy-sensitive systems, this model will specify trust/visibility boundaries (Malepati, 29 Nov 2025, Cai et al., 2022).
These models are interlocked in deployment; co-orchestration solutions employ agent-based, centralized, or federated architectures to update, synchronize, and act on state information in real time.
Major architectural patterns include:
- Dynamic Meshes at the Edge: Resources in limited proximity discover and opportunistically form short-lived wireless meshes, exchanging only tasks or summaries, not bulk data (Dona et al., 2024).
- Federated and Multi-Domain Control Planes: Hierarchical control strata (broker, cross-domain conductor, domain orchestrator, infrastructure) decompose global intents into per-domain allocations, with federated APIs harmonizing cross-domain resource semantics and enforcement (Taleb et al., 2022, Sofia et al., 19 Jan 2026).
- Agent-Based Distributed Coordination: Specialized agents collect telemetry, run privacy or capacity inference, and coordinate resource selection under global or local policies (Malepati, 29 Nov 2025).
2. Mathematical Formulations and Optimization Problems
The formal treatment of data-compute-network co-orchestration commonly employs constrained optimization, often instantiated as mixed-integer linear programs (MILP), multi-criteria LP relaxations, or online stochastic control.
Typical decision variables and constraints include:
- Decision Variables: Task-to-resource assignment (), network link usage (), resource reservation levels, and, where necessary, data movement operations.
- Objective Functions: Minimization of joint cost functions encompassing data volume transferred, compute time, resource utilization, storage occupancy, power cost, or privacy risk (Malepati, 29 Nov 2025, Mauro et al., 2024, Cai et al., 2022, Würthwein et al., 2022).
- Constraints: Node compute and storage capacities, link bandwidth and latency, privacy/trust attributes (e.g., in privacy-aware routing), and data locality or staleness bounds.
Example: For an information-aware DAG over a physical graph , the orchestration MILP seeks (function placed at node ), (flow of data stream on ), minimizing total compute and communication cost under capacity and chaining constraints (Mauro et al., 2024).
Multi-objective approaches—critical in privacy- or cost-aware inference—frame the problem as vector minimization of, e.g., , subject to per-request feasibility predicates (Malepati, 29 Nov 2025).
3. Algorithms, Protocols, and Scheduling Methods
State-of-the-art co-orchestration employs a range of algorithmic strategies:
- Static MILP/LP Decomposition with Randomized Rounding: Service DAGs are decomposed into multicast forests, LPs are solved for function placement and flow embedding, and solutions are rounded to integral, feasible allocations with probabilistic guarantees on cost and constraint violation (Mauro et al., 2024).
- Agent-Driven, Multi-Objective Scheduling with Heuristic Search: Distributed agents collect system state and per-request attributes, apply domain-specific filters (e.g., data-locality, privacy), and select feasible assignments using weighted scoring or auction-based negotiation, as in the WAVES routine (Malepati, 29 Nov 2025).
- Deep Reinforcement Learning and Graph Neural Policy Networks: For large-scale, online scenarios (e.g., resource-disaggregated optical datacenters), GNN-based RL agents learn network- and compute-aware allocation policies end-to-end, demonstrating scalability and efficient resource packing (Shabka et al., 2022).
- Distributed Online Max-Weight Routing: Hybrid queue/state-based scheduling blends backpressure with topology-aware bias, supporting optimal throughput and delay in NDN-based computing overlays (Feng et al., 2022).
- Adaptive Device- and Data-Locality Policies: Pseudocode and practical routines prioritize task placement on nodes hosting required data or in close network proximity, reducing bandwidth demand and latency (Malepati, 29 Nov 2025, Dona et al., 2024).
Protocol mechanisms vary: from REST/HTTP/gRPC APIs that propagate resource state in federated architectures (Taleb et al., 2022, Sofia et al., 19 Jan 2026), to P4-based programmable switch logic for in-network coherence and protection (Lee et al., 2021), and packet-level beaconing and RPC in mobile mesh contexts (Dona et al., 2024).
4. Practical Use Cases and Implementation Scenarios
Co-orchestration systems are deployed in highly diverse environments:
- Scientific Big Data and Exascale Flows: Integrated orchestration of Rucio-based data management, HTCondor job scheduling, and SENSE/NSI programmable networks enables just-in-time data staging and compute “gang scheduling” for exabyte-scale workflows (e.g., LHC/CMS)—directly reducing required storage buffers and improving makespans (Lehman et al., 2022, Würthwein et al., 2022).
- Augmented and Virtual Reality (NextG Media Services): Media service DAGs spanning edge/cloud are mapped to compute, storage, and network resources to optimize cost, latency, and capacity utilization in interactive applications (Mauro et al., 2024).
- Privacy-Aware, Decentralized AI Inference: Distributed agent systems (IslandRun) enforce privacy, trust, and data-locality, using reversible anonymization and multi-objective routing to orchestrate across heterogeneous personal and cloud resources (Malepati, 29 Nov 2025).
- Federated Edge-Cloud Microservice Orchestration: Extensions of Kubernetes (CODECO) leverage semantic application models, AI-driven context scoring, and partition-based federation to enable scalable, policy-compliant deployment and migration across edge clusters (Sofia et al., 19 Jan 2026).
- Resource-Disaggregated Data Centers: Programmable switches (MIND) or device-to-host co-designs (ORCA) centralize or eliminate memory management and data paths, supporting elastically scalable workloads with line-rate performance (Lee et al., 2021, Yuan et al., 2022).
- Disaggregated Reinforcement Learning Pipelines: Hybrid optical-electrical fabrics (OrchestrRL) time-multiplex high-bandwidth network resources to match the varying demands of parallel RL generation and training phases, coordinated via adaptive compute and network schedulers (Tan et al., 3 Jan 2026).
5. Evaluation Methodologies, Metrics, and Reported Results
Co-orchestration effectiveness is measured through a spectrum of metrics:
- Throughput and Acceptance Ratio: Quantifies successfully placed/completed requests against offered load (Shabka et al., 2022, Cai et al., 2022).
- Makespan and Latency: Captures total workflow duration (includes data movement, compute, and inter-stage dependencies), and per-task or per-stage end-to-end delays (Mauro et al., 2024, Lehman et al., 2022, Würthwein et al., 2022).
- Resource Utilization: Tracks CPU, memory, storage, and network usage across nodes; compared to baseline heuristic or centralized scheduling (Malepati, 29 Nov 2025, Shabka et al., 2022).
- Network Overheads and SLA Violations: Monitors inter-cluster or inter-domain data transfer, probe traffic, and compliance with latency/bandwidth guarantees (Sofia et al., 19 Jan 2026).
- Cost-Efficiency and Power Consumption: Ratio of realized workload to infrastructure cost or energy (Yuan et al., 2022, Tan et al., 3 Jan 2026).
- Scalability: Performance degradation as the number of nodes, requests, or federated domains increases; applicability of learned policies across unseen topologies (Shabka et al., 2022, Sofia et al., 19 Jan 2026).
- Policy Compliance and Privacy Guarantees: Measured by the fraction of traffic adhering to placement, data-sovereignty, or anonymization policies (Malepati, 29 Nov 2025).
Reported results include order-of-magnitude savings in storage for exascale workflows (Würthwein et al., 2022), up to 1.4× throughput improvement over static scheduling in RL workloads (Tan et al., 3 Jan 2026), and <10% capacity violation guarantees in multi-criteria approximations for DAG-based service deployment (Mauro et al., 2024).
6. Critical Analysis, Limitations, and Future Directions
Current challenges and open problems in data-compute-network co-orchestration include:
- Dynamic and Online Optimization: Most convex or LP-based frameworks assume static DAGs or workload arrivals. Dynamic, online adaptation—especially under stochastic link failures or mobile environments—remains open (Mauro et al., 2024, Shabka et al., 2022).
- Multi-Domain and Privacy Constraints: Harmonizing resource specifications, SLAs, and trust across administrative and legal boundaries is complex; standardized data models are incomplete, and security/trust logic is insufficiently mature (Taleb et al., 2022, Malepati, 29 Nov 2025).
- Resource Modeling and Heterogeneity: Extensions to GPU, FPGA, RAN slices, or energy-constrained settings are nascent; integrating multi-dimensional resource vectors and non-convex dependencies is nontrivial (Mauro et al., 2024, Sofia et al., 19 Jan 2026).
- Zero-Touch, Intent-Driven Operation: Fully declarative, closed-loop orchestration with robust telemetry, self-tuning, and minimal human intervention is an active research area (Taleb et al., 2022, Sofia et al., 19 Jan 2026).
- Performance Guarantees at Scale: Quantitative SLAs for tail latency, jitter, reliability, or multi-tenancy isolation across domains are difficult to enforce; formal proofs of optimality, convergence, or regret for online algorithms are often missing (Dona et al., 2024, Lehman et al., 2022).
- Integration with Lower-Level Network Schedulers: Harmonizing co-orchestration logic with hardware-level switch, programmable NIC, or in-memory fabrics is only partially realized, though switch-based MMU and direct cc-accelerator attachment show promise (Lee et al., 2021, Yuan et al., 2022).
Potential directions highlighted in the literature include embedding learned predictors for link state and peer reliability (AI-driven orchestration), formalizing joint optimization over stochastic time-varying graphs, and developing adaptive, privacy-aware, federated decision support systems that generalize across heterogeneous infrastructures (Dona et al., 2024, Sofia et al., 19 Jan 2026, Malepati, 29 Nov 2025).
7. Comparative Frameworks and Representative Implementations
The following table summarizes key features of representative co-orchestration systems discussed above.
| Framework | Domain | Core Techniques | Notable Features |
|---|---|---|---|
| AirDnD (Dona et al., 2024) | Edge/Mobile Mesh | Three-model abstraction | In-range compute marketplace, mesh-based offloading |
| IslandRun (Malepati, 29 Nov 2025) | Distributed AI | Multi-objective MILP, agents | Privacy-compliant routing, reversible anonymization |
| NextG Orchestration (Mauro et al., 2024) | Media/Edge-Cloud | DAG-to-Forest, multi-criteria LP | Function/flow placement, capacity-aware rounding |
| CODECO (Sofia et al., 19 Jan 2026) | Federated K8s | Context-AI, partitioned federation | Hybrid governance, AI-assisted placement |
| OrchestrRL (Tan et al., 3 Jan 2026) | Disaggregated RL | MILP/planner + OCS fabric | Hybrid optical-electrical, compute-network slack co-scheduling |
| MIND (Lee et al., 2021) | Disaggregated Datacenter | Network MMU, in-fabric coherence | Transparent elasticity, line-rate shared memory |
| ORCA (Yuan et al., 2022) | µs-scale Datacenter | RDMA/coherent accelerator | Unified ring abstraction, TPH-aware DMA |
| SDADO (Feng et al., 2022) | NDN/Distributed Compute | Service discovery + max-weight | Distributed, backpressure + topology-aware routing |
All entries above are reported or designed to maximize system-wide efficiency and/or enforce application-specific SLOs by jointly reasoning over data, computation, and network—including in environments with adversarial privacy, trust, or mobility requirements.
Comprehensive data-compute-network co-orchestration is rapidly transitioning from theoretical vision to practical imperative in both science and industry. Recent developments establish foundational models, algorithms, and architectures, but full realization of robust, adaptive, and scalable orchestration remains an open, interdisciplinary challenge that subsumes systems, optimization, networking, security, and AI (Dona et al., 2024, Malepati, 29 Nov 2025, Sofia et al., 19 Jan 2026, Würthwein et al., 2022, Tan et al., 3 Jan 2026).