Papers
Topics
Authors
Recent
Search
2000 character limit reached

Decentralized Heterogeneous Multi-Player Multi-Armed Bandits with Non-Zero Rewards on Collisions

Published 21 Oct 2019 in cs.LG, cs.GT, and stat.ML | (1910.09089v4)

Abstract: We consider a fully decentralized multi-player stochastic multi-armed bandit setting where the players cannot communicate with each other and can observe only their own actions and rewards. The environment may appear differently to different players, $\textit{i.e.}$, the reward distributions for a given arm are heterogeneous across players. In the case of a collision (when more than one player plays the same arm), we allow for the colliding players to receive non-zero rewards. The time-horizon $T$ for which the arms are played is \emph{not} known to the players. Within this setup, where the number of players is allowed to be greater than the number of arms, we present a policy that achieves near order-optimal expected regret of order $O(\log{1 + \delta} T)$ for some $0 < \delta < 1$ over a time-horizon of duration $T$. This paper is accepted at IEEE Transactions on Information Theory.

Citations (15)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.