Heterogeneous Distributed Linearly Separable Computation

Updated 22 January 2026

Heterogeneous distributed linearly separable computation is a framework for executing linear functions like matrix–vector products across diverse workers with varying storage and processing speeds.
It employs optimal coding techniques, such as MDS codes and convex optimization, to allocate loads and minimize latency and communication costs.
Applications include distributed multi-task learning and convex programming, achieving significant speedups and near-optimal communication efficiency in practical heterogeneous environments.

Heterogeneous distributed linearly separable computation refers to the theory and engineering of distributed systems in which linearly separable functions (such as matrix–vector products and multi-argument linear forms) are computed across collections of workers whose storage, data assignments, and computation speeds are not all the same. This domain encompasses optimal coding, load allocation, and communication protocols to handle heterogeneous resources while minimizing latency and communication—the theoretical underpinnings now established for arbitrary network and data assignment patterns. The area draws from information theory, coding theory, distributed optimization, and modern combinatorial constructions.

1. System Models: Task, Heterogeneity, and Coding

A canonical task is to compute $y = Ax$ or more general $K_c$ linear combinations from $K$ messages, $y = F_1 W$ , across $N$ workers, under the constraint that data distribution $\{\mathcal{S}_n\}$ (the collection of which datasets each worker sees) can be arbitrary and the workers can differ in storage capacity and processing speed. Workers may be further grouped by similar speed or straggling characteristics (e.g., as in group-based shifted-exponential models).

Heterogeneity enters at several layers:

Data assignment: Each worker $n$ stores a (possibly unique) subset $\mathcal{S}_n$ of the $K$ messages or data blocks. The assignment need not be cyclic or symmetric.
Computation speed and reliability: Workers may process tasks at different rates ( $s_n$ , $\mu_{(j)}$ , $\alpha_{(j)}$ ), influencing both static and stochastic latency.
Storage: Each worker may have a distinct storage capacity ( $\sigma_n$ —number of coded or uncoded blocks it can hold).

The foundational coding tool is the use of an $L$ -out-of- $N$ maximum distance separable (MDS) code, which enables any $L$ worker outputs to suffice for recovery. The MDS construction is adapted using group symmetries and zero-enforcement to handle arbitrary assignment matrices and communication patterns (Kim et al., 2019, Woolsey et al., 2020, Zhang et al., 15 Jan 2026, Cheng et al., 24 Jul 2025).

2. Load Allocation and Latency Optimization

For heterogeneous computational speeds and groupings, the core optimization problem is to minimize expected job completion time (latency) under coded splitting. Given an $(n,k)$ MDS code and the partitioning of $n$ coded jobs into $N$ workers (possibly with grouping), the question is how to select the allocation vector $(l_1,\dots,l_N)$ —the number of coded tasks per worker—in a manner matching individual capabilities.

For joint grouped heterogeneity, the optimal assignment is derived as follows:

The per-group latency is lower-bounded using closed forms for order-statistics of shifted exponentials: $\lambda_{r:N} \geq \max_j l_{(j)}\xi_j(r_j)$ , where $\xi_j(r_j)$ incorporates both shift and rate.
The problem reduces to a convex program: minimize $\max_j l_{(j)}\xi_j(r_j)$ subject to $\sum_j r_j l_{(j)} = k$ .
At optimality, all groups "tie": $l_{(1)}\xi_1(r_1) = \cdots = l_{(G)}\xi_G(r_G)$ . The unique minimizer for each $r_j$ is found via the lower branch of the Lambert- $W$ function.
The resulting solution is asymptotically exactly optimal as $N\to\infty$ . Faster groups see proportionally more load, and all groups finish simultaneously in expectation.

Numerical experiments confirm 10–50× speedups in high-heterogeneity regimes, with the optimal assignment yielding completion times nearly matching lower bounds and outperforming earlier heuristics by an order of magnitude as $N$ grows (Kim et al., 2019).

3. Universal Coding and Communication Trade-offs in Arbitrary Assignment

For arbitrary, heterogeneous data assignment, the computable dimension and required communication are captured by assignment matrix $A(n,k)$ and combinatorial forbidden-set analysis. Central results (Zhang et al., 15 Jan 2026) include:

Communication model: Each worker $n$ sends at most $R$ coded linear combinations of the messages it can access, subject to encodability constraints.
Trade-off parameterization: Let $\mathcal{U}$ denote forbidden $(T,S)$ submatrices where none of the $T$ workers store any of the $S$ data items and $R|T| + |S| > K_c$ . The derived converse equals $K_c \leq \min_{(T,S)\in\mathcal{U}} R(N - |T|)$ .
Achievability: There exists a universal linear coding scheme matching the converse in many cases, with construction based on random submatrices and nested null-space arguments to enforce encodability and decodability.

This methodology remains valid under fractional communication cost; subpacketization enables finer granularity. Thus, the characterization of achievable and converse trade-offs in arbitrary heterogeneous networks is now tight in important regimes (Zhang et al., 15 Jan 2026, Cheng et al., 24 Jul 2025).

4. Optimal Assignment Algorithms and Construction Methods

The optimal computation-task-to-worker assignment proceeds by solving a convex (often water-filling style) optimization for the fractional load vector $(l_n)$ , then deterministically rounding to integer or blockwise assignments via a "filling" algorithm, which ensures the combinatorial constraints (each block assigned to the required number of workers, and aggregate matches fractional solution).

The iterative filling algorithm (see pseudocode in (Woolsey et al., 2020, Woolsey et al., 2020)) selects, at each step, the machine with the smallest remaining load and fills with machines of highest current residual load, assigning to each an increment so that when subtracted, at least one machine reaches its target allocation.
The procedure is guaranteed to terminate in at most $N$ steps, and once completed, ensures that each machine's final allocation matches (to integer granularity) its optimal fractional load.
For systems with both speed and storage heterogeneity, machines with low storage and slow speed fill up first; the fastest/highest-storage machines receive residual load until total work is allocated.

This assignment framework achieves minimum makespan and is fully compatible with coded elastic computing and arbitrary data assignment (Woolsey et al., 2020, Woolsey et al., 2020).

5. Communication-Minimizing Schemes and Graph-Based Converse Bounds

In scenarios where total communication (rather than latency) is the bottleneck, the structure of the data assignment, query (number and type of linearly separable functions), and statistical dependencies all affect minimum necessary rates.

The application of Kőrner's characteristic graph entropy enables tight upper and (in many regimes) lower bounds on the required sum communication, accounting for storage heterogeneity, function structure, and data correlation (Malak et al., 2024).
The general upper bound for $K_c$ linearly separable functions over $\GF(2)$ under arbitrary assignment is:

$R_{\mathrm{sum}} \leq \sum_{i=1}^{N_r} H_{G_{X_i}^\cup}(X_i)$

where $G_{X_i}^\cup$ is the union of characteristic graphs for the queries.

For cyclic homogeneous assignment and i.i.d. uniform data, this specializes to $R_{\mathrm{sum}} \leq \min\{K_c, \Delta\}N_r$ , with gains (relative to Slepian–Wolf) being exponential for highly skewed data or large overlap (Malak et al., 2024).

Recent work constructs joint uplink-downlink coding schemes (uplink: worker-to-master; downlink: broadcast-for-reconstruction) that meet provable theoretical minima for both communication stages under mild conditions, by matrix decomposition (uplink encoder with enforced zeroes, downlink as MDS) and Hall-type matching constraints (Cheng et al., 24 Jul 2025).

6. Applications and Extensions

These theoretical frameworks and coding schemes extend to:

Distributed multi-task learning (DMTL) with nontrivial uplink/downlink structure, where coding-theoretic relations provide minimal-communication collaborative learning even under arbitrary placement and heterogeneity (Cheng et al., 24 Jul 2025).
Distributed classification and convex programming, where distributed multiplicative-weights protocols yield optimal communication for linearly separable learning and for fixed/fractional error rates in high-dimensional settings (III et al., 2012).
Extension to multi-linear and non-linear functions via generalized characteristic entropy methods, which have demonstrated exponential reduction in required communication for certain function/data regimes (Malak et al., 2024).

In summary, heterogeneous distributed linearly separable computation is now supported by a unified theory detailing the limits and explicit recipes for code construction, load assignment, and communication protocol, functioning robustly under arbitrary network and data heterogeneities, and achieving order-optimal efficiency in both latency and communication (Kim et al., 2019, Woolsey et al., 2020, Zhang et al., 15 Jan 2026, Cheng et al., 24 Jul 2025, Malak et al., 2024).