Distributed HDMM: Private, Secure Matrix Analytics

Updated 24 December 2025

Distributed HDMM is a protocol for privately answering high-dimensional linear queries in distributed settings by combining secure aggregation with a matrix mechanism.
It operates in three rounds—strategy broadcast, local noise injection with secure aggregation, and decoding—to accurately reconstruct query responses.
Empirical evaluations show that Distributed HDMM achieves near-centralized accuracy with scalable performance and significantly lower error than local or shuffle-based methods.

The Distributed High-Dimensional Matrix Mechanism (Distributed HDMM) is a protocol that enables differentially private answering of linear query workloads over high-dimensional distributed data, achieving the accuracy of centralized matrix mechanisms without relying on a trusted curator. Distributed HDMM integrates secure aggregation protocols with the matrix mechanism to guarantee privacy and robustness in adversarially controlled environments, enabling practical deployment in scenarios with thousands of clients and large, complex query workloads (Sedimo et al., 17 Dec 2025).

1. Problem Formulation

Distributed HDMM addresses the private computation of linear query workloads over distributed datasets. Consider $n$ clients $c_1,\ldots, c_n$ , each holding a record multiset $I_i \subseteq \mathcal{D}$ from a domain $\mathcal{D}$ of size $d = |\mathcal{D}|$ . The distributed dataset $I$ is the union of all local $I_i$ . Each $I_i$ is encoded by its histogram vector $x_i \in \mathbb{N}^d$ , such that $x = \sum_{i=1}^n x_i$ .

Given a query workload specified by $W \in \mathbb{R}^{q \times d}$ , the goal is to privately approximate the true workload answers $Wx \in \mathbb{R}^q$ under zero-concentrated differential privacy (zCDP) with parameter $\rho$ (convertible to standard $(\varepsilon, \delta)$ -DP as needed).

This extends the classical HDMM—previously requiring a trusted server holding $x$ —to settings where data remains decentralized and no single party observes all records, while preserving strong privacy and utility guarantees (Sedimo et al., 17 Dec 2025).

2. Distributed HDMM Protocol

Distributed HDMM operates in three rounds within the $\mathcal{F}_{\mathrm{agg}}$ -hybrid model, where $\mathcal{F}_{\mathrm{agg}}$ denotes a secure aggregation functionality:

1. Strategy Computation and Broadcast:

The server computes an optimal strategy matrix $A = \mathsf{optimize}(W) \in \mathbb{R}^{k\times d}$ , for some $k \le d$ , selected to minimize the expected error under HDMM. The $L_2$ -sensitivity $\Delta_2(A) = \max_{x \sim x'}\|A x - A x'\|_2$ (where $x, x'$ differ in one record) is computed. The server broadcasts $A$ and $\Delta_2(A)$ to all clients.

2. Local Measurement, Noise Injection, and Secure Aggregation:

Each client $c_i$ (a) computes $m_i = A x_i \in \mathbb{R}^k$ , (b) discretizes by $v_i = \lfloor\gamma m_i\rfloor \in \mathbb{Z}^k$ for a large $\gamma$ , (c) adds discrete Gaussian noise $\eta_i \sim \mathcal{N}_{\mathbb{Z}^k}(0, \sigma^2 I)$ , where

$\sigma^2 = \frac{\gamma^2 \Delta_2(A)^2}{2 (1-\theta) n \rho}$

and $\theta < 1/2$ bounds the corrupted (non-noise-contributing) client fraction, (d) forms $\hat{v}_i = v_i + \eta_i$ , (e) reduces mod a prime $p > 2 \max_i |\hat{v}_i|$ to get $\hat{m}_i \in \mathbb{F}_p^k$ , and (f) submits $\hat m_i$ via secure aggregation. The server learns only the sum $\hat{M} = \sum_{i=1}^n \hat{m}_i \in \mathbb{F}_p^k$ .

3. Decoding and Post-Processing:

The server decodes $\hat{M}$ , inverts the mod- $p$ and scaling, yielding

$\hat{M}_d = \frac{1}{\gamma} \mathrm{Decode}(\hat{M}) \approx \sum_{i=1}^n m_i + \sum_{i=1}^n \eta_i$

and finally reconstructs the workload answers as $\hat{a} = A^+ \hat{M}_d \in \mathbb{R}^q$ , where $A^+$ denotes the pseudoinverse, releasing $\hat{a}$ to the analyst (Sedimo et al., 17 Dec 2025).

3. Differential Privacy Guarantees

Distributed HDMM achieves $(\rho, 0)$ -zCDP for the privatized workload output. If each client adds discrete Gaussian noise with variance as above, honest clients collectively ensure that the output sum $\hat{M}_d$ is equivalent to outputting $A x$ plus discrete Gaussian noise of variance $\frac{\gamma^2 \Delta_2(A)^2}{2\rho}$ in each direction.

Using the distributed discrete-Gaussian lemma [Kairouz et al., 2021], the aggregate noise preserves zCDP up to an exponentially small correction $\kappa(\gamma, n)$ . Applying the standard zCDP to $(\varepsilon, \delta)$ -DP conversion by tail-bounding yields

$\varepsilon = \rho + 2\sqrt{\rho \ln(1/\delta)} + \kappa(\gamma, n)$

for any $\delta > 0$ .

Thus, Distributed HDMM’s privacy guarantee nearly matches the central model—assuming an honest majority and correct local implementation of noise injection by clients—without reliance on a trusted aggregator (Sedimo et al., 17 Dec 2025).

4. Security and Threat Model

Distributed HDMM assumes an adversarial environment comprising an untrusted server and up to $\theta n$ malicious clients, with at least $n(1-\theta)$ honest clients to provide necessary noise for privacy. The secure aggregation protocol $\mathcal{F}_{\mathrm{agg}}$ ensures no participant learns any individual $c_i$ 's contribution.

In the semi-honest model, all parties follow the protocol, but may try to infer additional information. Privacy is retained.
In the malicious model, some clients or the server may actively deviate from the protocol. Confidentiality remains, but correctness is not assured unless clients also provide zero-knowledge input proofs (e.g., ACORN, EiFFeL).
The server can select a non-optimal $A$ , but since sensitivity $\Delta_2(A)$ is locally computed by clients, their local noise ensures privacy for any $A$ (Sedimo et al., 17 Dec 2025).
Honest-majority noise: $\theta < 1/2$ is required for privacy.

5. Computational and Communication Complexity

Let $k$ denote the number of measurements (rows in $A$ ):

Client computation: $O(dk)$ for $A x_i$ , $O(k\log n)$ for secure aggregation masking; total $O(kd + k\log n)$ .
Client communication: $O(k + \log n)$ field elements per round.
Server computation: $O(nk + \mathsf{optimize}(W))$ for all-client aggregation and optimization; $O(nk\log n)$ for unmasking.
Server communication: $O(nk + n\log n)$ .

As secure aggregation scales polylogarithmically with $n$ , Distributed HDMM is practical even for thousands to millions of clients. The total overhead scales linearly in $k$ (i.e., as the number of queries and the size of the measurement matrix), but only polylogarithmically in $n$ (Sedimo et al., 17 Dec 2025).

6. Empirical Evaluation

Sedimo et al. implemented Distributed HDMM using the Olympia simulator, evaluating on:

Census SF1: Thousands of high-dimensional counting queries over the 2010 U.S. Census summary file.
Adult (UCI): Two-way marginal queries.

Experiments with $n \in \{100, 1000, 3000\}$ clients examined both semi-honest and malicious settings at $\rho$ -zCDP equivalent to $\varepsilon \approx 1$ with $\delta = 10^{-6}$ . Key findings:

Runtime: For 1,000 clients, semi-honest DS MM completes end-to-end in $\sim$ 4.4s; average client cost $\sim$ 80ms; server cost $\sim$ 4s. Under the malicious model, total runtime remains under 10s.
Communication: Per-client cost $\sim$ 350 KB; server receives $\sim$ 350 MB for 1,000 clients.
Utility: $\ell_2$ RMSE matches central HDMM at $\theta=0$ ; error increases slowly with $\theta$ , remaining within a small constant factor. Local DP and shuffle-model baselines incur $10\times$ – $100\times$ higher error (Sedimo et al., 17 Dec 2025).

This suggests Distributed HDMM achieves near-optimal accuracy with orders-of-magnitude better utility than local or shuffle-based mechanisms.

Metric	Value (1,000 clients)	Scaling
End-to-end runtime	~4.4 s (semi-honest)	Linear in $k$
Client comm.	~350 KB	Linear in $k$
Server comm.	~350 MB	Linear in $k$
Utility (RMSE)	Matches central HDMM	Robust to $\theta < 0.5$

Distributed HDMM generalizes and achieves the advantages of the centralized HDMM without a trusted curator by leveraging secure aggregation and careful noise coordination. Related protocols, such as the DMM protocol based on packed linear secret resharing (Bienstock et al., 2024), further extend the practicality of this approach to federated learning, providing constant-overhead per dimension for high-dimensional models and supporting dynamic client participation.

Distributed HDMM and contemporaneous distributed matrix mechanism protocols represent the state of the art for large-scale, distributed, differentially private analytics on high-dimensional data, combining secure multiparty computation with matrix-mechanism-based noise strategies to yield strong utility and privacy trade-offs (Sedimo et al., 17 Dec 2025, Bienstock et al., 2024).

Markdown Report Issue Upgrade to Chat

References (2)

Distributed HDMM: Scalable, Distributed, Accurate, and Differentially Private Query Workloads without a Trusted Curator (2025)

DMM: Distributed Matrix Mechanism for Differentially-Private Federated Learning Based on Constant-Overhead Linear Secret Resharing (2024)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Distributed High-Dimensional Matrix Mechanism (Distributed HDMM).