An Improved Relaxation for Oracle-Efficient Adversarial Contextual Bandits

Published 29 Oct 2023 in cs.LG, cs.DS, and stat.ML | (2310.19025v2)

Abstract: We present an oracle-efficient relaxation for the adversarial contextual bandits problem, where the contexts are sequentially drawn i.i.d from a known distribution and the cost sequence is chosen by an online adversary. Our algorithm has a regret bound of $O(T^{{\frac{2}{3}}(K\log(|\Pi|))^{{\frac{1}{3}})$}} and makes at most $O(K)$ calls per round to an offline optimization oracle, where $K$ denotes the number of actions, $T$ denotes the number of rounds and $\Pi$ denotes the set of policies. This is the first result to improve the prior best bound of $O((TK)^{{\frac{2}{3}}(\log(|\Pi|))^{{\frac{1}{3}})$}} as obtained by Syrgkanis et al. at NeurIPS 2016, and the first to match the original bound of Langford and Zhang at NeurIPS 2007 which was obtained for the stochastic case.