Generalizing DP-SGD with Shuffling and Batch Clipping

Published 12 Dec 2022 in cs.LG and cs.CR | (2212.05796v3)

Abstract: Classical differential private DP-SGD implements individual clipping with random subsampling, which forces a mini-batch SGD approach. We provide a general differential private algorithmic framework that goes beyond DP-SGD and allows any possible first order optimizers (e.g., classical SGD and momentum based SGD approaches) in combination with batch clipping, which clips an aggregate of computed gradients rather than summing clipped gradients (as is done in individual clipping). The framework also admits sampling techniques beyond random subsampling such as shuffling. Our DP analysis follows the $f$-DP approach and introduces a new proof technique which allows us to derive simple closed form expressions and to also analyse group privacy. In particular, for $E$ epochs work and groups of size $g$, we show a $\sqrt{g E}$ DP dependency for batch clipping with shuffling.

Abstract PDF Upgrade to Chat

Citations (1)

View on Semantic Scholar

Summary

The paper introduces a novel framework that replaces individual gradient clipping with batch clipping enhanced by shuffling, enabling any first-order optimizer.
The paper employs an f-DP approach yielding closed-form privacy expressions with a √(gE) dependency, providing rigorous group-level privacy guarantees.
The paper demonstrates competitive accuracy on CIFAR-10 and MNIST, matching traditional DP-SGD while offering enhanced flexibility and privacy analysis.

Generalizing DP-SGD with Shuffling and Batch Clipping

The paper "Generalizing DP-SGD with Shuffling and Batch Clipping" (2212.05796) presents a significant extension of the traditional Differentially Private Stochastic Gradient Descent (DP-SGD) framework. This extension introduces new techniques that enable the use of any first-order optimizer, such as classical SGD and momentum-based approaches, while implementing batch clipping as opposed to the standard individual clipping. Moreover, the framework incorporates more advanced sampling techniques, such as shuffling, beyond random subsampling.

Differential Privacy Techniques

Algorithmic Framework

The authors introduce a robust differential private algorithmic framework that generalizes beyond DP-SGD. The core idea is to replace individual gradient clipping with batch clipping, where an aggregate of computed gradients is clipped. The presence of shuffling techniques offers an alternative to random subsampling, opening up new possibilities for implementing differential privacy in machine learning contexts.

DP Analysis Using $f$ -DP Approach

The paper employs an $f$ -DP approach for the differential privacy analysis. This method is significant as it introduces a novel proof technique allowing for the derivation of closed-form expressions, particularly useful in assessing the privacy guarantees at the group level. Specifically, for $E$ epochs of work and groups of size $g$ , the work shows a $\sqrt{gE}$ dependency in privacy for batch clipping with shuffling.

Experimental Results

The experimental setup demonstrates how the proposed framework compares against traditional DP-SGD with individual clipping. Key findings include:

CIFAR-10 Testing Accuracy: The batch clipping with shuffling achieves about 71.5% accuracy, compared to 71.1% accuracy using individual clipping with subsampling.
MNIST Testing Accuracy: Results show 98.3% accuracy with batch clipping and shuffling, comparable to 98.4% with individual clipping.

Figure 1: CIFAR10 and MNIST testing accuracy for different batch sizes, comparing SubSampling (SS) and SHuffling (SH) with Individual Clipping (IC), Batch Clipping (BC), and Mixed Clipping (MC).

Implications and Future Work

Theoretical and Practical Implications

This work expands the current understanding of differential privacy in machine learning by enabling more versatile optimization strategies. The introduction of batch clipping and shuffling can lead to performance benefits and potentially better privacy guarantees, as suggested by the $\sqrt{gE}$ dependency in group privacy.

Future Directions

The framework sets the stage for further research into differential privacy using other sampling techniques and further generalizations. Exploring different first-order optimizers within this context and varying the clipping strategies could yield improvements in both privacy and model performance.

Conclusion

The study effectively broadens the scope of DP-SGD by incorporating advanced algorithms and sampling methods. While current results show promising advancements in the accuracy of model outputs while maintaining privacy guarantees, future research can further enhance these techniques and provide stronger theoretical and practical insights into differential privacy in machine learning.