Gittins' theorem under uncertainty

Published 12 Jul 2019 in math.OC, math.PR, math.ST, q-fin.CP, and stat.TH | (1907.05689v3)

Abstract: We study dynamic allocation problems for discrete time multi-armed bandits under uncertainty, based on the the theory of nonlinear expectations. We show that, under strong independence of the bandits and with some relaxation in the definition of optimality, a Gittins allocation index gives optimal choices. This involves studying the interaction of our uncertainty with controls which determine the filtration. We also run a simple numerical example which illustrates the interaction between the willingness to explore and uncertainty aversion of the agent when making decisions.