Papers
Topics
Authors
Recent
Search
2000 character limit reached

Distributed HDMM: Private, Secure Matrix Analytics

Updated 24 December 2025
  • Distributed HDMM is a protocol for privately answering high-dimensional linear queries in distributed settings by combining secure aggregation with a matrix mechanism.
  • It operates in three rounds—strategy broadcast, local noise injection with secure aggregation, and decoding—to accurately reconstruct query responses.
  • Empirical evaluations show that Distributed HDMM achieves near-centralized accuracy with scalable performance and significantly lower error than local or shuffle-based methods.

The Distributed High-Dimensional Matrix Mechanism (Distributed HDMM) is a protocol that enables differentially private answering of linear query workloads over high-dimensional distributed data, achieving the accuracy of centralized matrix mechanisms without relying on a trusted curator. Distributed HDMM integrates secure aggregation protocols with the matrix mechanism to guarantee privacy and robustness in adversarially controlled environments, enabling practical deployment in scenarios with thousands of clients and large, complex query workloads (Sedimo et al., 17 Dec 2025).

1. Problem Formulation

Distributed HDMM addresses the private computation of linear query workloads over distributed datasets. Consider nn clients c1,,cnc_1,\ldots, c_n, each holding a record multiset IiDI_i \subseteq \mathcal{D} from a domain D\mathcal{D} of size d=Dd = |\mathcal{D}|. The distributed dataset II is the union of all local IiI_i. Each IiI_i is encoded by its histogram vector xiNdx_i \in \mathbb{N}^d, such that x=i=1nxix = \sum_{i=1}^n x_i.

Given a query workload specified by WRq×dW \in \mathbb{R}^{q \times d}, the goal is to privately approximate the true workload answers WxRqWx \in \mathbb{R}^q under zero-concentrated differential privacy (zCDP) with parameter ρ\rho (convertible to standard (ε,δ)(\varepsilon, \delta)-DP as needed).

This extends the classical HDMM—previously requiring a trusted server holding xx—to settings where data remains decentralized and no single party observes all records, while preserving strong privacy and utility guarantees (Sedimo et al., 17 Dec 2025).

2. Distributed HDMM Protocol

Distributed HDMM operates in three rounds within the Fagg\mathcal{F}_{\mathrm{agg}}-hybrid model, where Fagg\mathcal{F}_{\mathrm{agg}} denotes a secure aggregation functionality:

1. Strategy Computation and Broadcast:

The server computes an optimal strategy matrix A=optimize(W)Rk×dA = \mathsf{optimize}(W) \in \mathbb{R}^{k\times d}, for some kdk \le d, selected to minimize the expected error under HDMM. The L2L_2-sensitivity Δ2(A)=maxxxAxAx2\Delta_2(A) = \max_{x \sim x'}\|A x - A x'\|_2 (where x,xx, x' differ in one record) is computed. The server broadcasts AA and Δ2(A)\Delta_2(A) to all clients.

2. Local Measurement, Noise Injection, and Secure Aggregation:

Each client cic_i (a) computes mi=AxiRkm_i = A x_i \in \mathbb{R}^k, (b) discretizes by vi=γmiZkv_i = \lfloor\gamma m_i\rfloor \in \mathbb{Z}^k for a large γ\gamma, (c) adds discrete Gaussian noise ηiNZk(0,σ2I)\eta_i \sim \mathcal{N}_{\mathbb{Z}^k}(0, \sigma^2 I), where

σ2=γ2Δ2(A)22(1θ)nρ\sigma^2 = \frac{\gamma^2 \Delta_2(A)^2}{2 (1-\theta) n \rho}

and θ<1/2\theta < 1/2 bounds the corrupted (non-noise-contributing) client fraction, (d) forms v^i=vi+ηi\hat{v}_i = v_i + \eta_i, (e) reduces mod a prime p>2maxiv^ip > 2 \max_i |\hat{v}_i| to get m^iFpk\hat{m}_i \in \mathbb{F}_p^k, and (f) submits m^i\hat m_i via secure aggregation. The server learns only the sum M^=i=1nm^iFpk\hat{M} = \sum_{i=1}^n \hat{m}_i \in \mathbb{F}_p^k.

3. Decoding and Post-Processing:

The server decodes M^\hat{M}, inverts the mod-pp and scaling, yielding

M^d=1γDecode(M^)i=1nmi+i=1nηi\hat{M}_d = \frac{1}{\gamma} \mathrm{Decode}(\hat{M}) \approx \sum_{i=1}^n m_i + \sum_{i=1}^n \eta_i

and finally reconstructs the workload answers as a^=A+M^dRq\hat{a} = A^+ \hat{M}_d \in \mathbb{R}^q, where A+A^+ denotes the pseudoinverse, releasing a^\hat{a} to the analyst (Sedimo et al., 17 Dec 2025).

3. Differential Privacy Guarantees

Distributed HDMM achieves (ρ,0)(\rho, 0)-zCDP for the privatized workload output. If each client adds discrete Gaussian noise with variance as above, honest clients collectively ensure that the output sum M^d\hat{M}_d is equivalent to outputting AxA x plus discrete Gaussian noise of variance γ2Δ2(A)22ρ\frac{\gamma^2 \Delta_2(A)^2}{2\rho} in each direction.

Using the distributed discrete-Gaussian lemma [Kairouz et al., 2021], the aggregate noise preserves zCDP up to an exponentially small correction κ(γ,n)\kappa(\gamma, n). Applying the standard zCDP to (ε,δ)(\varepsilon, \delta)-DP conversion by tail-bounding yields

ε=ρ+2ρln(1/δ)+κ(γ,n)\varepsilon = \rho + 2\sqrt{\rho \ln(1/\delta)} + \kappa(\gamma, n)

for any δ>0\delta > 0.

Thus, Distributed HDMM’s privacy guarantee nearly matches the central model—assuming an honest majority and correct local implementation of noise injection by clients—without reliance on a trusted aggregator (Sedimo et al., 17 Dec 2025).

4. Security and Threat Model

Distributed HDMM assumes an adversarial environment comprising an untrusted server and up to θn\theta n malicious clients, with at least n(1θ)n(1-\theta) honest clients to provide necessary noise for privacy. The secure aggregation protocol Fagg\mathcal{F}_{\mathrm{agg}} ensures no participant learns any individual cic_i's contribution.

  • In the semi-honest model, all parties follow the protocol, but may try to infer additional information. Privacy is retained.
  • In the malicious model, some clients or the server may actively deviate from the protocol. Confidentiality remains, but correctness is not assured unless clients also provide zero-knowledge input proofs (e.g., ACORN, EiFFeL).
  • The server can select a non-optimal AA, but since sensitivity Δ2(A)\Delta_2(A) is locally computed by clients, their local noise ensures privacy for any AA (Sedimo et al., 17 Dec 2025).
  • Honest-majority noise: θ<1/2\theta < 1/2 is required for privacy.

5. Computational and Communication Complexity

Let kk denote the number of measurements (rows in AA):

  • Client computation: O(dk)O(dk) for AxiA x_i, O(klogn)O(k\log n) for secure aggregation masking; total O(kd+klogn)O(kd + k\log n).
  • Client communication: O(k+logn)O(k + \log n) field elements per round.
  • Server computation: O(nk+optimize(W))O(nk + \mathsf{optimize}(W)) for all-client aggregation and optimization; O(nklogn)O(nk\log n) for unmasking.
  • Server communication: O(nk+nlogn)O(nk + n\log n).

As secure aggregation scales polylogarithmically with nn, Distributed HDMM is practical even for thousands to millions of clients. The total overhead scales linearly in kk (i.e., as the number of queries and the size of the measurement matrix), but only polylogarithmically in nn (Sedimo et al., 17 Dec 2025).

6. Empirical Evaluation

Sedimo et al. implemented Distributed HDMM using the Olympia simulator, evaluating on:

  • Census SF1: Thousands of high-dimensional counting queries over the 2010 U.S. Census summary file.
  • Adult (UCI): Two-way marginal queries.

Experiments with n{100,1000,3000}n \in \{100, 1000, 3000\} clients examined both semi-honest and malicious settings at ρ\rho-zCDP equivalent to ε1\varepsilon \approx 1 with δ=106\delta = 10^{-6}. Key findings:

  • Runtime: For 1,000 clients, semi-honest DS MM completes end-to-end in \sim4.4s; average client cost \sim80ms; server cost \sim4s. Under the malicious model, total runtime remains under 10s.
  • Communication: Per-client cost \sim350 KB; server receives \sim350 MB for 1,000 clients.
  • Utility: 2\ell_2 RMSE matches central HDMM at θ=0\theta=0; error increases slowly with θ\theta, remaining within a small constant factor. Local DP and shuffle-model baselines incur 10×10\times100×100\times higher error (Sedimo et al., 17 Dec 2025).

This suggests Distributed HDMM achieves near-optimal accuracy with orders-of-magnitude better utility than local or shuffle-based mechanisms.

Metric Value (1,000 clients) Scaling
End-to-end runtime ~4.4 s (semi-honest) Linear in kk
Client comm. ~350 KB Linear in kk
Server comm. ~350 MB Linear in kk
Utility (RMSE) Matches central HDMM Robust to θ<0.5\theta < 0.5

Distributed HDMM generalizes and achieves the advantages of the centralized HDMM without a trusted curator by leveraging secure aggregation and careful noise coordination. Related protocols, such as the DMM protocol based on packed linear secret resharing (Bienstock et al., 2024), further extend the practicality of this approach to federated learning, providing constant-overhead per dimension for high-dimensional models and supporting dynamic client participation.

Distributed HDMM and contemporaneous distributed matrix mechanism protocols represent the state of the art for large-scale, distributed, differentially private analytics on high-dimensional data, combining secure multiparty computation with matrix-mechanism-based noise strategies to yield strong utility and privacy trade-offs (Sedimo et al., 17 Dec 2025, Bienstock et al., 2024).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Distributed High-Dimensional Matrix Mechanism (Distributed HDMM).