2000 character limit reached
Optimally Confident UCB: Improved Regret for Finite-Armed Bandits
Published 28 Jul 2015 in cs.LG and math.OC | (1507.07880v3)
Abstract: I present the first algorithm for stochastic finite-armed bandits that simultaneously enjoys order-optimal problem-dependent regret and worst-case regret. Besides the theoretical results, the new algorithm is simple, efficient and empirically superb. The approach is based on UCB, but with a carefully chosen confidence parameter that optimally balances the risk of failing confidence intervals against the cost of excessive optimism.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.