Papers
Topics
Authors
Recent
Search
2000 character limit reached

Stop Relying on No-Choice and Do not Repeat the Moves: Optimal, Efficient and Practical Algorithms for Assortment Optimization

Published 29 Feb 2024 in cs.LG and cs.IR | (2402.18917v1)

Abstract: We address the problem of active online assortment optimization problem with preference feedback, which is a framework for modeling user choices and subsetwise utility maximization. The framework is useful in various real-world applications including ad placement, online retail, recommender systems, fine-tuning LLMs, amongst many. The problem, although has been studied in the past, lacks an intuitive and practical solution approach with simultaneously efficient algorithm and optimal regret guarantee. E.g., popularly used assortment selection algorithms often require the presence of a strong reference' which is always included in the choice sets, further they are also designed to offer the same assortments repeatedly until the reference item gets selected -- all such requirements are quite unrealistic for practical applications. In this paper, we designed efficient algorithms for the problem of regret minimization in assortment selection with \emph{Plackett Luce} (PL) based user choices. We designed a novel concentration guarantee for estimating the score parameters of the PL model using\emph{Pairwise Rank-Breaking}', which builds the foundation of our proposed algorithms. Moreover, our methods are practical, provably optimal, and devoid of the aforementioned limitations of the existing methods. Empirical evaluations corroborate our findings and outperform the existing baselines.

Authors (2)
Definition Search Book Streamline Icon: https://streamlinehq.com
References (37)
  1. Mnl-bandit: A dynamic learning approach to assortment selection. Operations Research, 67(5):1453–1485, 2019.
  2. Reducing dueling bandits to cardinal bandits. In International Conference on Machine Learning, pages 856–864. PMLR, 2014.
  3. Exploration–exploitation tradeoff using variance estimates in multi-armed bandits. Theoretical Computer Science, 410(19):1876–1902, 2009.
  4. Peter Auer. Using upper confidence bounds for online learning. In Foundations of Computer Science, 2000. Proceedings. 41st Annual Symposium on, pages 270–279. IEEE, 2000.
  5. Finite-time analysis of the multiarmed bandit problem. Machine learning, 47(2-3):235–256, 2002.
  6. On the tightness of an lp relaxation for rational optimization and its applications. Operations Research Letters, 44(5):612–617, 2016.
  7. Random utility theory for social choice. In Advances in Neural Information Processing Systems, pages 126–134, 2012.
  8. Preference-based online learning with dueling bandits: A survey. Journal of Machine Learning Research, 2021a.
  9. Preference-based online learning with dueling bandits: A survey. J. Mach. Learn. Res., 22:7–1, 2021b.
  10. Stochastic contextual dueling bandits under linear stochastic transitivity models. In International Conference on Machine Learning, pages 1764–1786. PMLR, 2022.
  11. Assortment optimisation under a general discrete choice model: A tight analysis of revenue-ordered assortments. arXiv preprint arXiv:1606.01371, 2016.
  12. Multi-dueling bandits and their application to online ranker evaluation. CoRR, abs/1608.06253, 2016.
  13. On the theory of reinforcement learning with once-per-episode feedback. arXiv preprint arXiv:2105.14363, 2021.
  14. Xi Chen and Yining Wang. A note on a tight lower bound for mnl-bandit assortment selection models. arXiv preprint arXiv:1709.06109, 2017.
  15. Competitive analysis of the top-k ranking problem. In Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1245–1264. SIAM, 2017.
  16. A nearly instance optimal algorithm for top-k ranking under the multinomial logit model. In Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 2504–2522. SIAM, 2018.
  17. Dynamic assortment planning under nested logit models. Production and Operations Management, 30(1):85–102, 2021.
  18. Deep reinforcement learning from human preferences. Advances in neural information processing systems, 30, 2017.
  19. Assortment planning under the multinomial logit model with totally unimodular constraint structures. Work in Progress, 2013.
  20. Assortment optimization under the mallows model. In Advances in Neural Information Processing Systems, pages 4700–4708, 2016a.
  21. Capacity constrained assortment optimization under the markov chain based choice model. Operations Research, 2016b.
  22. Optimal sample complexity of m-wise data for top-k ranking. In Advances in Neural Information Processing Systems, pages 1685–1695, 2017.
  23. Data-driven rank breaking for efficient rank aggregation. Journal of Machine Learning Research, 17(193):1–54, 2016.
  24. Assortment optimization under a single transition model. 2017.
  25. Multinomial logit bandit with linear utility functions. arXiv preprint arXiv:1805.02971, 2018.
  26. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744, 2022.
  27. PAC ranking from pairwise and listwise queries: Lower bounds and upper bounds. arXiv preprint arXiv:1806.02970, 2018.
  28. Dynamic assortment optimization with a multinomial logit choice model and capacity constraint. Operations research, 58(6):1666–1680, 2010.
  29. Active ranking with subset-wise preferences. International Conference on Artificial Intelligence and Statistics (AISTATS), 2018.
  30. Combinatorial bandits with relative feedback. In Advances in Neural Information Processing Systems, 2019a.
  31. PAC Battling Bandits in the Plackett-Luce Model. In Algorithmic Learning Theory, pages 700–737, 2019b.
  32. Multi-dueling bandits with dependent arms. In Conference on Uncertainty in Artificial Intelligence, UAI’17, 2017.
  33. Revenue management under a general discrete choice model of consumer behavior. Management Science, 50(1):15–33, 2004.
  34. The k𝑘kitalic_k-armed dueling bandits problem. Journal of Computer and System Sciences, 78(5):1538–1556, 2012.
  35. Relative upper confidence bound for the k𝑘kitalic_k-armed dueling bandit problem. In JMLR Workshop and Conference Proceedings, number 32, pages 10–18. JMLR, 2014a.
  36. Relative confidence sampling for efficient on-line ranker evaluation. In Proceedings of the 7th ACM international conference on Web search and data mining, pages 73–82. ACM, 2014b.
  37. Copeland dueling bandits. In Advances in Neural Information Processing Systems, pages 307–315, 2015.
Citations (1)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 2 tweets with 3 likes about this paper.