Thompson Sampling under Bernoulli Rewards with Local Differential Privacy

Published 3 Jul 2023 in cs.LG and cs.CR | (2307.00863v1)

Abstract: This paper investigates the problem of regret minimization for multi-armed bandit (MAB) problems with local differential privacy (LDP) guarantee. Given a fixed privacy budget $\epsilon$, we consider three privatizing mechanisms under Bernoulli scenario: linear, quadratic and exponential mechanisms. Under each mechanism, we derive stochastic regret bound for Thompson Sampling algorithm. Finally, we simulate to illustrate the convergence of different mechanisms under different privacy budgets.