Papers
Topics
Authors
Recent
Search
2000 character limit reached

Multi-DC Optical Networks

Updated 30 December 2025
  • Multi-Datacenter Optical Networks are high-capacity fiber systems connecting geographically dispersed datacenters to support distributed machine learning with pipeline-parallel training.
  • The CBA framework dynamically adjusts frequency slot allocation and employs MILP-based scheduling to achieve a 31% reduction in iteration time and improved network performance.
  • Experimental evaluations on NSFNET topologies confirm that real-time resource adaptation and contiguity-aware path selection significantly reduce bubble ratios and blocking probabilities.

Multi-Datacenter Optical Networks constitute the physical and algorithmic foundation for distributed machine learning training that spans geographically separated datacenters interconnected via high-capacity optical fiber networks. These systems are increasingly critical for scaling LLM and deep neural network (DNN) training where hardware resources in a single facility are insufficient. Multi-DC optical networks introduce novel challenges in resource assignment, communication scheduling, and system optimization, necessitating frameworks that co-design pipeline-parallel training algorithms with real-time network state awareness, latency estimation, and traffic engineering. Below, key principles, frameworks, and results from recent advances such as CBA ("Communication-Bound-Aware Cross-Domain Resource Assignment for Pipeline-Parallel Distributed LLM Training in Dynamic Multi-DC Optical Networks" (Fu et al., 23 Dec 2025)) are summarized in rigorous detail alongside representative approaches.

1. Distributed Training over Multi-DC Optical Network Topologies

Multi-DC optical networks are typically abstracted as a graph G=(V,E)G = (V, E) where VV represents individual datacenters (DCs) and EE are fiber links supporting WW frequency slots per fiber (e.g., W=80|W| = 80 slots, each $12.5$ GHz, NSFNET topology (Fu et al., 23 Dec 2025)). Each link ee maintains a binary frequency-slot occupancy vector se[1..W]s_e[1..W] at time tt. In pipeline-parallel (PP) distributed LLM training, LL layers are partitioned into PP stages P0,...,PP1P_0, ..., P_{P-1} with each stage mapped to a GPU—often spread across multiple DCs. Each of MM micro-batches per iteration triggers (P1)M(P-1) \cdot M inter-DC transmission requests via dynamic optical network traffic, where link occupancy may overlap across requests due to temporal demand.

Key metrics are:

  • Per-iteration runtime TiterT_{iter}: wall-clock time from start of first forward to end of last backward micro-batch.
  • Bubble ratio RbubbleR_{bubble}: proportion of iteration time spent idling owing to communication delays.
  • Blocking probability pblockp_{block}: fraction of transmission requests that cannot be assigned a feasible path and frequency slot block, inducing delay or cancellation.

2. Communication-Aware Resource Assignment and Scheduling

Recent frameworks such as CBA (Fu et al., 23 Dec 2025) model PP training as a mixed-integer linear program (MILP) seeking to minimize TiterT_{iter} under multi-DC optical network constraints. Decision variables xr,p,i,fx_{r,p,i,f} indicate the assignment of micro-batch transmission rr (corresponding to a stage-to-stage data movement) to optical path pp and contiguous frequency slot block ff on every link of pp.

The communication latency for a request rr with payload cc traversing path pp and slot block ff is captured by the α\alphaβ\beta model: Tcomm(r)=αp+βpc+εp(c)T_{comm}(r) = \alpha_p + \beta_p \cdot c + \varepsilon_p(c) where αp\alpha_p, βp\beta_p are path-specific offset/bandwidth parameters updated per iteration, and εp(c)\varepsilon_p(c) accounts for queuing delays.

Scheduling constraints rigorously ensure that no frequency slot on any link is double-booked and that frequency-slot block assignment remains contiguous on all links of a path.

3. Communication-Bound-Aware (CBA) Dynamic Resource Adaptation

The crux of CBA (Fu et al., 23 Dec 2025) is adaptive, cross-domain orchestration:

  • Detection of communication-bound tasks: the orchestrator inspects the previous schedule Sj1S_{j-1} to label any micro-batch computation as communication-bound if network delays exceed prior dependency completion (cur.start_time>prev.completion_time+Latency_DC_connectcur.start\_time > prev.completion\_time + Latency\_DC\_connect).
  • Dynamic frequency slot demand adjustment: if a transmission was blocked last iteration, decrease its slot demand by one; if labeled communication-bound, increment by one (bounded system-wide) to secure wider spectrum and improve latency.
  • K-shortest-path search with contiguity-aware path selection: for each transmission, the framework evaluates KK candidate paths and slot blocks, calculating a fitness score

I(pk,f)=Cavail(pk,f)L(pk)(1ρ(pk))I(p_k, f) = \frac{C_{avail}(p_k, f)}{L(p_k)} \cdot (1 - \rho(p_k))

where CavailC_{avail} is the contiguity index, L(pk)L(p_k) is path hop count, and ρ(pk)\rho(p_k) is current frequency slot usage fraction.

This heuristic approach enables real-time resource adaptation as network state and model demands evolve during training. No formal worst-case guarantee on solution approximation is provided.

4. Performance Characterization and Benchmarks

Experimental evaluation (Fu et al., 23 Dec 2025) utilizes NSFNET (14 nodes, 21 links; 80 FS/link, 12.5 GHz; 64-QAM modulation), placing GPUs randomly in six DCs for Llama 3 models (8B, 70B, 8 PP stages).

  • Baseline algorithms: KSP-FF (K-shortest paths, first-fit assignment) and SD-FF (shortest-distance path, first-fit).
  • Key results (Llama 3 70B, GPipe, M=128M=128 micro-batches):
Metric KSP-FF SD-FF CBA (Ours)
Iteration time (s) 102.4 98.7 68.0
Bubble ratio (%) 48.1 45.5 37.9
Blocking prob. (%) 17.3 15.9 13.8
  • Improvements over best baseline:
    • 31.25%31.25\% reduction in iteration time
    • 11.96%11.96\% decrease in bubble ratio
    • 13.20%13.20\% fewer blocking requests

CBA ablation studies show that disabling communication-bound task labeling or dynamic α\alphaβ\beta latency updates leads to inferior bubble ratio and blocking probability.

5. Theoretical and Algorithmic Complexity

The per-iteration complexity of CBA (Fu et al., 23 Dec 2025) is O((P1)MK(ElogV+W))O((P-1)\cdot M \cdot K \cdot (E\log V + W)):

  • KK-shortest-path: O(K(ElogV))O(K (E \log V)) per request,
  • Contiguity and fitness computation: O(W)O(W) per path.

Given practical values (e.g., P=8P=8, M=128M=128, K=4K=4, W=80W=80), CBA remains computationally tractable even in large network topologies.

Multi-DC optical networking for distributed machine learning entails tight co-design between application-level pipeline-parallel training and optical network resource management. CrossPipe (Chen et al., 30 Jun 2025) generalizes multi-DC pipeline scheduling as a constraint optimization model, providing both CP solver and greedy near-optimal schedules, explicitly accounting for bandwidth and latency (via α\alphaβ\beta model) and achieving up to 33.6%33.6\% reduction in training time compared to static schedules.

Alternate frameworks such as SPP (Luo et al., 2022), HelixPipe (Zhang et al., 1 Jul 2025), TawPipe (Wu et al., 12 Nov 2025), and BaPipe (Zhao et al., 2020) focus on device-level communication patterns, weight-passing schemes, and load-balanced stage partitioning, providing the necessary abstractions for scaling within or across DC boundaries.

CBA represents the state of the art in integrating pipeline-parallel task scheduling with real-time optical network state, adapting spectrum assignment dynamically, and maximizing utilization under stringent multi-DC constraints. Such communication-bound-aware resource assignment mechanisms are fundamental to the sustainable scaling of distributed LLM and DNN training workloads across geographically distributed datacenters.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Multi-Datacenter Optical Networks.