Differentially Private Federated Learning: A Client Level Perspective

Published 20 Dec 2017 in cs.CR, cs.LG, and stat.ML | (1712.07557v2)

Abstract: Federated learning is a recent advance in privacy protection. In this context, a trusted curator aggregates parameters optimized in decentralized fashion by multiple clients. The resulting model is then distributed back to all clients, ultimately converging to a joint representative model without explicitly having to share the data. However, the protocol is vulnerable to differential attacks, which could originate from any party contributing during federated optimization. In such an attack, a client's contribution during training and information about their data set is revealed through analyzing the distributed model. We tackle this problem and propose an algorithm for client sided differential privacy preserving federated optimization. The aim is to hide clients' contributions during training, balancing the trade-off between privacy loss and model performance. Empirical studies suggest that given a sufficiently large number of participating clients, our proposed procedure can maintain client-level differential privacy at only a minor cost in model performance.

Abstract PDF Upgrade to Chat

Citations (1,189)

View on Semantic Scholar

Summary

The paper presents a novel framework that secures entire client datasets from differential attacks in federated learning.
It employs randomized client sub-sampling and Gaussian noise distortion, with a moments accountant to dynamically balance privacy and model performance.
Experiments on MNIST reveal that with thousands of clients, the DP model attains accuracy nearly comparable to non-DP models, confirming practical viability.

Differentially Private Federated Learning: A Client Level Perspective

Federated learning is a transformative approach that prioritizes privacy by decentralizing the learning process across multiple clients. Geyer, Klein, and Nabi's paper, "Differentially Private Federated Learning: A Client Level Perspective," addresses the critical vulnerability in this paradigm—differential attacks that can reveal sensitive client data through the shared model parameters. This paper proposes an algorithm that promotes client-level differential privacy (DP) in federated learning while maintaining high model performance.

Key Contributions

1. Client-Level Differential Privacy:

Contrary to existing DP methods which focus on the privacy of individual data points, this paper's approach ensures that an entire client's dataset remains private. By incorporating a DP-preserving mechanism at the client level, the authors create a federated learning protocol that prevents the identification of whether a particular client participated in the training process.

2. Performance Retention with Minor Loss:

Through empirical studies, the authors demonstrate that their algorithm can preserve client-level DP with only a minor sacrifice in model performance, assuming a sufficiently large number of participating clients. The procedure dynamically adapts the DP-preserving mechanism during decentralized training, which diverges from conventional centralized DP approaches where such adaptations were not beneficial.

Methodological Insights

The paper proposes a nuanced randomized mechanism that includes two main steps: random sub-sampling and distorting using a Gaussian mechanism (GM):

Random Sub-Sampling: At each communication round, a random subset of clients is selected to receive and optimize the central model. The differences between the local models and the central model (termed as updates) are then sent back to the central curator.
Distorting with Gaussian Mechanism: The sum of these updates is distorted by adding Gaussian noise calibrated to the updates' sensitivity. By clipping the updates to a prescribed bound $S$ before adding noise, the sensitivity remains controlled, ensuring the DP guarantees.

The authors employ a moments accountant to track DP loss, which offers tighter bounds than standard composition theorems, ensuring training halts when the DP threshold is reached.

Experimental Validation

Experiments employed the well-known MNIST dataset in a federated setting with simulations on varying client populations: 100, 1,000, and 10,000 clients. The results reveal:

For scenarios with fewer clients (100 and 1,000), model accuracy under DP constraints was significantly below non-DP models but still substantially better than individual client performance.
For a large number of clients (10,000), DP models achieved accuracy comparable to non-DP models, suggesting the practicality of the approach in widespread real-world applications such as mobile and consumer devices.

The paper also introduces innovative metrics like between-client variance $V_c$ and update scale $U_s$ , illustrating that the federated setting benefits dynamically from varying the number of participating clients and adapting the Gaussian noise parameters as training progresses.

Implications and Future Developments

Practical Implications:

Implementing client-level DP in federated learning is particularly promising in domains like healthcare, where several institutions can jointly train models without compromising individual data privacy. This method can substantially enhance the privacy-preserving capabilities of federated learning systems used in everyday applications.

Theoretical Advancements and Future Research:

The dynamic adaptation of DP mechanisms presents an interesting divergence from traditional centralized approaches. The authors hint at a deeper connection to information theory, suggesting future studies could involve deriving optimal bounds for signal-to-noise ratios related to communication rounds, data representativity, and between-client variance.

Conclusion

The paper by Geyer, Klein, and Nabi advances the field by providing a robust framework for achieving client-level differential privacy in federated learning, ensuring high performance where numerous clients are involved. The empirical evaluation reaffirms the practicality and minimal performance trade-offs of the proposed algorithm. Future work should aim to refine these methods, optimizing privacy budgets and further integrating theories from information science to solidify these initial findings.

By addressing both practical and theoretical challenges, this research opens new avenues for privacy-preserving collaborative learning, making significant strides towards secure and efficient federated learning systems.