Distributed HDMM: Private, Secure Matrix Analytics
- Distributed HDMM is a protocol for privately answering high-dimensional linear queries in distributed settings by combining secure aggregation with a matrix mechanism.
- It operates in three rounds—strategy broadcast, local noise injection with secure aggregation, and decoding—to accurately reconstruct query responses.
- Empirical evaluations show that Distributed HDMM achieves near-centralized accuracy with scalable performance and significantly lower error than local or shuffle-based methods.
The Distributed High-Dimensional Matrix Mechanism (Distributed HDMM) is a protocol that enables differentially private answering of linear query workloads over high-dimensional distributed data, achieving the accuracy of centralized matrix mechanisms without relying on a trusted curator. Distributed HDMM integrates secure aggregation protocols with the matrix mechanism to guarantee privacy and robustness in adversarially controlled environments, enabling practical deployment in scenarios with thousands of clients and large, complex query workloads (Sedimo et al., 17 Dec 2025).
1. Problem Formulation
Distributed HDMM addresses the private computation of linear query workloads over distributed datasets. Consider clients , each holding a record multiset from a domain of size . The distributed dataset is the union of all local . Each is encoded by its histogram vector , such that .
Given a query workload specified by , the goal is to privately approximate the true workload answers under zero-concentrated differential privacy (zCDP) with parameter (convertible to standard -DP as needed).
This extends the classical HDMM—previously requiring a trusted server holding —to settings where data remains decentralized and no single party observes all records, while preserving strong privacy and utility guarantees (Sedimo et al., 17 Dec 2025).
2. Distributed HDMM Protocol
Distributed HDMM operates in three rounds within the -hybrid model, where denotes a secure aggregation functionality:
1. Strategy Computation and Broadcast:
The server computes an optimal strategy matrix , for some , selected to minimize the expected error under HDMM. The -sensitivity (where differ in one record) is computed. The server broadcasts and to all clients.
2. Local Measurement, Noise Injection, and Secure Aggregation:
Each client (a) computes , (b) discretizes by for a large , (c) adds discrete Gaussian noise , where
and bounds the corrupted (non-noise-contributing) client fraction, (d) forms , (e) reduces mod a prime to get , and (f) submits via secure aggregation. The server learns only the sum .
3. Decoding and Post-Processing:
The server decodes , inverts the mod- and scaling, yielding
and finally reconstructs the workload answers as , where denotes the pseudoinverse, releasing to the analyst (Sedimo et al., 17 Dec 2025).
3. Differential Privacy Guarantees
Distributed HDMM achieves -zCDP for the privatized workload output. If each client adds discrete Gaussian noise with variance as above, honest clients collectively ensure that the output sum is equivalent to outputting plus discrete Gaussian noise of variance in each direction.
Using the distributed discrete-Gaussian lemma [Kairouz et al., 2021], the aggregate noise preserves zCDP up to an exponentially small correction . Applying the standard zCDP to -DP conversion by tail-bounding yields
for any .
Thus, Distributed HDMM’s privacy guarantee nearly matches the central model—assuming an honest majority and correct local implementation of noise injection by clients—without reliance on a trusted aggregator (Sedimo et al., 17 Dec 2025).
4. Security and Threat Model
Distributed HDMM assumes an adversarial environment comprising an untrusted server and up to malicious clients, with at least honest clients to provide necessary noise for privacy. The secure aggregation protocol ensures no participant learns any individual 's contribution.
- In the semi-honest model, all parties follow the protocol, but may try to infer additional information. Privacy is retained.
- In the malicious model, some clients or the server may actively deviate from the protocol. Confidentiality remains, but correctness is not assured unless clients also provide zero-knowledge input proofs (e.g., ACORN, EiFFeL).
- The server can select a non-optimal , but since sensitivity is locally computed by clients, their local noise ensures privacy for any (Sedimo et al., 17 Dec 2025).
- Honest-majority noise: is required for privacy.
5. Computational and Communication Complexity
Let denote the number of measurements (rows in ):
- Client computation: for , for secure aggregation masking; total .
- Client communication: field elements per round.
- Server computation: for all-client aggregation and optimization; for unmasking.
- Server communication: .
As secure aggregation scales polylogarithmically with , Distributed HDMM is practical even for thousands to millions of clients. The total overhead scales linearly in (i.e., as the number of queries and the size of the measurement matrix), but only polylogarithmically in (Sedimo et al., 17 Dec 2025).
6. Empirical Evaluation
Sedimo et al. implemented Distributed HDMM using the Olympia simulator, evaluating on:
- Census SF1: Thousands of high-dimensional counting queries over the 2010 U.S. Census summary file.
- Adult (UCI): Two-way marginal queries.
Experiments with clients examined both semi-honest and malicious settings at -zCDP equivalent to with . Key findings:
- Runtime: For 1,000 clients, semi-honest DS MM completes end-to-end in 4.4s; average client cost 80ms; server cost 4s. Under the malicious model, total runtime remains under 10s.
- Communication: Per-client cost 350 KB; server receives 350 MB for 1,000 clients.
- Utility: RMSE matches central HDMM at ; error increases slowly with , remaining within a small constant factor. Local DP and shuffle-model baselines incur – higher error (Sedimo et al., 17 Dec 2025).
This suggests Distributed HDMM achieves near-optimal accuracy with orders-of-magnitude better utility than local or shuffle-based mechanisms.
| Metric | Value (1,000 clients) | Scaling |
|---|---|---|
| End-to-end runtime | ~4.4 s (semi-honest) | Linear in |
| Client comm. | ~350 KB | Linear in |
| Server comm. | ~350 MB | Linear in |
| Utility (RMSE) | Matches central HDMM | Robust to |
7. Extensions and Related Work
Distributed HDMM generalizes and achieves the advantages of the centralized HDMM without a trusted curator by leveraging secure aggregation and careful noise coordination. Related protocols, such as the DMM protocol based on packed linear secret resharing (Bienstock et al., 2024), further extend the practicality of this approach to federated learning, providing constant-overhead per dimension for high-dimensional models and supporting dynamic client participation.
Distributed HDMM and contemporaneous distributed matrix mechanism protocols represent the state of the art for large-scale, distributed, differentially private analytics on high-dimensional data, combining secure multiparty computation with matrix-mechanism-based noise strategies to yield strong utility and privacy trade-offs (Sedimo et al., 17 Dec 2025, Bienstock et al., 2024).