Distributed High-Dimensional Mean Estimation

Updated 29 January 2026

Distributed high-dimensional mean estimation is a framework for computing the mean of decentralized, high-dimensional data under strict communication constraints.
Protocols like binary search, sparse thresholding, and lattice-based quantization offer precise trade-offs between mean-squared error and communication cost.
Advanced schemes integrate robustness and privacy measures, exploiting inter-client similarity to achieve reliable performance in federated learning.

Distributed high-dimensional mean estimation refers to the problem of estimating the mean $\mu \in \mathbb{R}^d$ of a population or dataset, where the data are distributed across $m$ remote machines or clients, each of which may operate under stringent communication constraints. This regime is fundamental in distributed statistical learning, federated optimization, and large-scale signal processing. The literature rigorously characterizes the interplay between statistical efficiency (mean squared error, bias), communication complexity (bit-budget), robustness (adversarial and privacy considerations), and the impact of high dimensionality and structural priors (such as sparsity or inter-client similarity).

1. Problem Formulation and Central Lower Bounds

In the canonical formulation, each of $m$ machines holds $n$ i.i.d. samples $X^{(j,1)},\ldots,X^{(j,n)} \sim \mathcal{N}(\theta, \sigma^2 I_d)$ with unknown mean $\theta \in [-1,1]^d$ (Garg et al., 2014). The mechanisms governing communication may be interactive or restricted to a single round of simultaneous (one-shot) messages. Let $\hat{\theta}$ denote the estimator (possibly randomized or a function of a communication transcript $Y$ ), and define minimax mean-squared risk

$R(\Pi, \hat{\theta}) = \sup_{\|\theta\|_\infty \leq 1} \mathbb{E}_{\Pi, X}\left[ \|\hat{\theta}(Y) - \theta\|_2^2 \right].$

The goal is to achieve the minimax rate $R_{\minimax} = \frac{d\,\sigma^2}{mn}$ (centralized oracle risk) while minimizing total communication cost $\mathsf{CC}(\Pi)$ .

A direct-sum theorem establishes that the information (and thus communication) cost in $d$ dimensions is at least $d$ times that for $d=1$ (Garg et al., 2014). Matching and nearly-matching lower bounds are:

Simultaneous (one-round) protocols: $\Omega(md)$ bits to reach the minimax squared loss (Garg et al., 2014).
Interactive protocols: $\Omega(md / \log m)$ (Garg et al., 2014).

These bounds are tight up to logarithmic factors: a binary-search interactive protocol achieves $O(md)$ bits and minimax risk in general dimensions (Garg et al., 2014).

2. Communication-Optimal Algorithms: Protocols and Trade-offs

Interactive Binary Search Protocol

The binary-search–based interactive protocol (see detailed pseudocode structure in (Garg et al., 2014)) recursively refines coordinate-wise intervals and solicits bits from successive machines, shrinking the uncertainty by a $3/4$ factor per round. The algorithm uses $O(md)$ bits and achieves final mean-squared error $O(d/(mn))$ .

Sparse Mean Case

When $\theta$ is known to be $s$ -sparse, a simple thresholding protocol yields communication savings by a factor up to $d/s$ (Garg et al., 2014). For

$\alpha = 1$ , communication $\tilde O(sm)$ and risk $O(s\sigma^2/(mn))$ .
$\alpha = d/s$ , communication $\tilde O(dm)$ and risk $O(d\sigma^2/(mn))$ . Any intermediate $\alpha$ gives a trade-off between communication and error.

Lattice-Based Quantization for Arbitrary Input Norms

A lattice quantization protocol (encoding via a $\epsilon$ -lattice with $q$ -coloring) attains mean-squared error $O(y^2/q)$ with $O(d \log q)$ bits per node, where $y^2$ is the maximum pairwise squared distance between client vectors (Davies et al., 2020). Unlike norm-dependent bounds, this approach’s error depends only on how clustered the vectors are, not their global scale. Matching lower bounds confirm optimality in regimes where input vectors are close (Davies et al., 2020).

The following table summarizes key communication/error trade-offs and their settings:

Protocol/Setting	Per-node Bits	Mean-Squared Error
Binary-search (full)	$O(d)$	$O(d/(mn))$
Thresholded (s-sparse)	$\tilde O(s)$	$O(s\sigma^2/(mn))$
Lattice-quantized	$O(d \log q)$	$O(y^2/q)$
One-bit (DRIVE)	$d + O(1)$	$O(1/(n))$ per unit norm

3. Communication-Efficient Quantization and Correlation-Aware Schemes

In high-dimensional regimes where vectors are similar across clients (as in distributed optimization or federated learning after many rounds), exploiting correlation enables further gains:

Wyner–Ziv–type quantizers (Q_{WZ}): Structured random rotations (Hadamard or similar), modulo scalar quantization, and partial coordinate sharing achieve mean squared error

$\mathbb{E}[ \|\hat{\mu} - \mu \|_2^2 ] \le (79 \log_2 k + 26)\ \sum_{i=1}^n \frac{d \Delta_i^2}{n^2 r}$

where $\Delta_i^2$ is the bound on $\|x_i - y_i\|_2^2$ and $r$ is per-client budget in bits (Liang et al., 2021). By exploiting inter-client similarity (chaining with side information), further reductions are achieved—whenever the data form tight “chains” of low pairwise distances, overall MSE improves by a strict constant factor (Liang et al., 2021).

Collaborative compressors: Four classes—NoisySign, HadamardMultiDim, SparseReg, and OneBit—match or exceed standard independent-quantization protocols when vectors are nearly identical. Their $\ell_2$ , $\ell_\infty$ , and cosine errors are shown to decay to zero rapidly as similarity measures $\Delta_2$ , $\Delta_\infty$ , or $\Delta_\mathrm{corr}$ tend to zero, offering exponential-in- $m$ convergence for fixed per-node bits (Vardhan et al., 26 Jan 2026).
DRIVE (Deterministically Rounding randomly rotated VEctors): Achieves $O(1/n)$ normalized MSE with $d+O(1)$ bits per client using fast Hadamard rotations and per-instance scaling (Vargaftik et al., 2021).

These approaches are effective even for very large $d$ (millions), with computational cost $O(d \log d)$ for encoding and decoding via fast transforms (Vargaftik et al., 2021).

4. Robust and Privacy-Preserving Distributed Mean Estimation

Byzantine robustness, privacy (differential privacy, DP), and adversarial tolerance are vital in federated and distributed learning:

Byzantine-Robust Estimators: A semi-verified mean estimation procedure splits the space into an adversarial “bad” subspace and its orthogonal complement (Zhao et al., 2023). In $S^\perp$ , the mean can be robustly estimated from contaminated data, while in $S$ the mean is estimated from a small, trusted auxiliary sample. The overall MSE is

$\mathbb{E}[ \|\widehat{\mu} - \mu^* \|_2^2 ] \le \frac{3 \sigma^2 p}{N_A} + \frac{15 \lambda_c}{2\alpha} + o(1)$

for $p$ -dimensional $S$ and contamination fraction $1-\alpha$ (Zhao et al., 2023).

Differentially Private DME (CorDP-DME): The CorDP framework parameterizes the noise correlation between client messages. By tuning the correlation from independent (local DP) to fully anti-correlated (cryptographic secure aggregation), utility (MSE) can be improved from $O(d/(n\epsilon^2))$ to as low as $O(d/(n^2\epsilon^2))$ , with precise formulas balancing resilience to dropouts/collusion and estimation accuracy (Vithana et al., 2024).

The table below contrasts DP protocols:

Mechanism	Utility (MSE)	Dropouts/Collusion Resilience
Local DP (LDP)	$O(d/(n \epsilon^2))$	Strong (no coordination needed)
SecAgg/CDP	$O(d/(n^2 \epsilon^2))$	Weaker, multi-round, fragile
CorDP-DME	$O(d/((n-c)^2 \epsilon^2))$	Balanced by ρ, user-tunable

5. Minimax-Optimal Aggregation and One-Shot Weighted Estimation

Recent advances provide practical, statistically efficient, one-shot (single-round) aggregation without iterative communication:

Inverse-Variance Weighted Fusion: Each machine transmits only its local mean and diagonal empirical variances ($2d$ real values), and the master fuses with coordinate-wise inverse-variance weights:

$(\hat{\mu})_j = \sum_{i=1}^m w_{i,j} \cdot (\hat{\mu}_i)_j, \quad w_{i,j} = \frac{n_i/\sigma^2_{i,j}}{\sum_{\ell=1}^m n_\ell/\sigma^2_{\ell,j}}$

achieving exact minimax $\mathbb{E}\|\hat{\mu} - \mu\|^2 = \sum_{j=1}^d 1/\left( \sum_{i=1}^m n_i/\sigma^2_{i,j} \right)$ with optimal communication $O(md)$ (Lu et al., 2022).

Robust Streaming and Consensus Algorithms: Distributed trimmed-mean with communication-graph–based consensus achieves with high probability

$\|\mu^i_t - \mu^*\|_2 \le O(\sigma\sqrt{d\varepsilon}) + O\left(\sigma\sqrt{\frac{d}{m t}}\right)$

at each node, for adversarial contamination fraction $\varepsilon$ , $m$ nodes, $t$ local samples (Yao et al., 2022).

6. Design Principles, Structural Exploitation, and Open Questions

Key structural features fundamentally impact cost/error trade-off:

Direct-sum effect: Each coordinate essentially incurs separate communication cost if unstructured; $\Omega(md)$ is generically necessary (Garg et al., 2014).
Structure (sparsity, low-rank): For $s$ -sparse mean, transmission cost can be reduced by $d/s$ ; conjecturally no protocol beats CC $\times$ risk $\gtrsim s d \sigma^2 / (mn)$ up to $\mathrm{polylog}$ (Garg et al., 2014).
Correlated/clustered vectors: Protocols leveraging inter-node similarity (e.g., correlation-aware compressors, Wyner–Ziv schemes, collaborative codes) exhibit error decaying rapidly as the clients' vectors become more similar, with precise graceful degradation guarantees quantifying this transition (Liang et al., 2021, Vardhan et al., 26 Jan 2026).

Open directions include the design of statistical protocols that attain simultaneous optimality under communication, privacy, robustness, and structural assumptions in a single unified framework (Garg et al., 2014, Vardhan et al., 26 Jan 2026, Vithana et al., 2024).

7. Practical Considerations and Implementation

In large-scale/in-practice systems:

Hadamard-based schemes are widely used for $O(d \log d)$ transforms enabling efficient random rotation (Vargaftik et al., 2021, Davies et al., 2020).
Bit-packing for quantized outputs typically uses $d + O(\log d)$ total bits per vector (Vargaftik et al., 2021).
Parameter selection for correlation-aware protocols involves explicit balancing of error, communication, and user similarity, often via greedy chaining or region-driven pair selection (Liang et al., 2021).
Empirical findings consistently show that widely-separated vectors limit gains from correlation-aware protocols, whereas clustered or smooth update regimes in federated optimization benefit substantially from collaborative coding (Davies et al., 2020, Vardhan et al., 26 Jan 2026).

In summary, current research establishes that distributed high-dimensional mean estimation is fundamentally governed by the direct-sum effect in the absence of structure, but that compressibility, inter-node similarity, robust estimation, and privacy requirements each enable and constrain protocol design. Optimal and near-optimal trade-offs have now been precisely characterized under a range of statistical, adversarial, and resource models.