An optimal algorithm for bandit convex optimization

Published 14 Mar 2016 in cs.LG and cs.DS | (1603.04350v2)

Abstract: We consider the problem of online convex optimization against an arbitrary adversary with bandit feedback, known as bandit convex optimization. We give the first $\tilde{O}(\sqrt{T})$-regret algorithm for this setting based on a novel application of the ellipsoid method to online learning. This bound is known to be tight up to logarithmic factors. Our analysis introduces new tools in discrete convex geometry.