Papers
Topics
Authors
Recent
Search
2000 character limit reached

Fully Gap-Dependent Bounds for Multinomial Logit Bandit

Published 19 Nov 2020 in cs.LG | (2011.09998v1)

Abstract: We study the multinomial logit (MNL) bandit problem, where at each time step, the seller offers an assortment of size at most $K$ from a pool of $N$ items, and the buyer purchases an item from the assortment according to a MNL choice model. The objective is to learn the model parameters and maximize the expected revenue. We present (i) an algorithm that identifies the optimal assortment $S*$ within $\widetilde{O}(\sum_{i = 1}N \Delta_i{-2})$ time steps with high probability, and (ii) an algorithm that incurs $O(\sum_{i \notin S*} K\Delta_i{-1} \log T)$ regret in $T$ time steps. To our knowledge, our algorithms are the first to achieve gap-dependent bounds that fully depends on the suboptimality gaps of all items. Our technical contributions include an algorithmic framework that relates the MNL-bandit problem to a variant of the top-$K$ arm identification problem in multi-armed bandits, a generalized epoch-based offering procedure, and a layer-based adaptive estimation procedure.

Citations (2)

Summary

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (1)

Collections

Sign up for free to add this paper to one or more collections.