Papers
Topics
Authors
Recent
Search
2000 character limit reached

Multi-Armed Bandits with Network Interference

Published 28 May 2024 in cs.LG, stat.ME, and stat.ML | (2405.18621v1)

Abstract: Online experimentation with interference is a common challenge in modern applications such as e-commerce and adaptive clinical trials in medicine. For example, in online marketplaces, the revenue of a good depends on discounts applied to competing goods. Statistical inference with interference is widely studied in the offline setting, but far less is known about how to adaptively assign treatments to minimize regret. We address this gap by studying a multi-armed bandit (MAB) problem where a learner (e-commerce platform) sequentially assigns one of possible $\mathcal{A}$ actions (discounts) to $N$ units (goods) over $T$ rounds to minimize regret (maximize revenue). Unlike traditional MAB problems, the reward of each unit depends on the treatments assigned to other units, i.e., there is interference across the underlying network of units. With $\mathcal{A}$ actions and $N$ units, minimizing regret is combinatorially difficult since the action space grows as $\mathcal{A}N$. To overcome this issue, we study a sparse network interference model, where the reward of a unit is only affected by the treatments assigned to $s$ neighboring units. We use tools from discrete Fourier analysis to develop a sparse linear representation of the unit-specific reward $r_n: [\mathcal{A}]N \rightarrow \mathbb{R} $, and propose simple, linear regression-based algorithms to minimize regret. Importantly, our algorithms achieve provably low regret both when the learner observes the interference neighborhood for all units and when it is unknown. This significantly generalizes other works on this topic which impose strict conditions on the strength of interference on a known network, and also compare regret to a markedly weaker optimal action. Empirically, we corroborate our theoretical findings via numerical simulations.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (35)
  1. Improved algorithms for linear stochastic bandits. Advances in Neural Information Processing Systems, 24, 2011.
  2. Online-to-confidence-set conversions and application to sparse stochastic bandits. In Artificial Intelligence and Statistics, pages 1–9. PMLR, 2012.
  3. Synthetic combinations: A causal inference framework for combinatorial interventions. Advances in Neural Information Processing Systems, 36:19195–19216, 2023.
  4. Network synthetic interventions: A causal framework for panel data under network interference. arXiv preprint arXiv:2210.11355, 2022.
  5. Peter M Aronow. A general method for detecting interference between units in randomized experiments. Sociological Methods & Research, 41(1):3–16, 2012.
  6. Estimating average causal effects under general interference, with application to a social network experiment. The Annals of Applied Statistics, 11(4):1912 – 1947, 2017. doi: 10.1214/16-AOAS1005. URL https://doi.org/10.1214/16-AOAS1005.
  7. Finite-time analysis of the multiarmed bandit problem. Machine Learning, 47:235–256, 2002.
  8. Multiple randomization designs. arXiv preprint arXiv:2112.13495, 2021.
  9. Experimental design in marketplaces. Statistical Science, 38(3):458–476, 2023.
  10. Causal inference under interference and network uncertainty. In Uncertainty in Artificial Intelligence, pages 1028–1038. PMLR, 2020.
  11. A causal inference framework for network interference with panel data. In NeurIPS 2022 Workshop on Causality for Real-world Impact, 2022.
  12. Combinatorial bandits. Journal of Computer and System Sciences, 78(5):1404–1422, 2012.
  13. Combinatorial multi-armed bandit: General framework and applications. In International conference on machine learning, pages 151–159. PMLR, 2013.
  14. On kernelized multi-armed bandits. In International Conference on Machine Learning, pages 844–853. PMLR, 2017.
  15. Contextual bandits for adapting treatment in a mouse model of de novo carcinogenesis. In Machine Learning for Healthcare Conference, pages 67–82. PMLR, 2018.
  16. Action elimination and stopping conditions for the multi-armed bandit and reinforcement learning problems. Journal of Machine Learning Research, 7(6), 2006.
  17. Causal inference in network experiments: regression-based analysis and design-based properties, 2023.
  18. High-dimensional sparse linear bandits. Advances in Neural Information Processing Systems, 33:10753–10763, 2020.
  19. Toward causal inference with interference. Journal of the American Statistical Association, 103(482):832–842, 2008.
  20. Multi-armed bandits with interference. arXiv preprint arXiv:2402.01845, 2024.
  21. Sparse stochastic bandits. arXiv preprint arXiv:1706.01383, 2017.
  22. Bandit algorithms. Cambridge University Press, 2020.
  23. Collaborative filtering bandits. In Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 539–548, 2016.
  24. Learning sparse boolean polynomials. In 2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pages 2032–2036. IEEE, 2012.
  25. Ryan O’Donnell. Analysis of boolean functions. Cambridge University Press, 2014.
  26. Variance reduction in bipartite experiments through correlation clustering. Advances in Neural Information Processing Systems, 32, 2019.
  27. High-dimensional statistics. arXiv preprint arXiv:2310.19244, 2023.
  28. Paul R Rosenbaum. Interference between units in randomized experiments. Journal of the American Statistical Association, 102(477):191–200, 2007.
  29. Donald B Rubin. Bayesian inference for causal effects: The role of randomization. The Annals of statistics, pages 34–58, 1978.
  30. Gaussian process optimization in the bandit setting: No regret and experimental design. arXiv preprint arXiv:0912.3995, 2009.
  31. Graph cluster randomization: Network exposure to multiple universes. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 329–337, 2013.
  32. Roman Vershynin. High-dimensional probability: An introduction with applications in data science, volume 47. Cambridge university press, 2018.
  33. Martin J Wainwright. High-dimensional statistics: A non-asymptotic viewpoint, volume 48. Cambridge university press, 2019.
  34. On the sublinear regret of GP-UCB. Advances in Neural Information Processing Systems, 36, 2024.
  35. Estimating the total treatment effect in randomized experiments with unknown network structure. Proceedings of the National Academy of Sciences, 119(44):e2208975119, 2022.
Citations (2)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 2 tweets with 5 likes about this paper.