Confounded Budgeted Causal Bandits
Abstract: We study the problem of learning 'good' interventions in a stochastic environment modeled by its underlying causal graph. Good interventions refer to interventions that maximize rewards. Specifically, we consider the setting of a pre-specified budget constraint, where interventions can have non-uniform costs. We show that this problem can be formulated as maximizing the expected reward for a stochastic multi-armed bandit with side information. We propose an algorithm to minimize the cumulative regret in general causal graphs. This algorithm trades off observations and interventions based on their costs to achieve the optimal reward. This algorithm generalizes the state-of-the-art methods by allowing non-uniform costs and hidden confounders in the causal graph. Furthermore, we develop an algorithm to minimize the simple regret in the budgeted setting with non-uniform costs and also general causal graphs. We provide theoretical guarantees, including both upper and lower bounds, as well as empirical evaluations of our algorithms. Our empirical results showcase that our algorithms outperform the state of the art.
- Best arm identification in multi-armed bandits. In COLT, pages 41–53. Citeseer, 2010.
- Finite-time analysis of the multiarmed bandit problem. Machine learning, 47(2):235–256, 2002.
- Bandits with unobserved confounders: A causal approach. Advances in Neural Information Processing Systems, 28, 2015.
- Learning and sampling of atomic interventions from observations. In International Conference on Machine Learning, pages 842–853. PMLR, 2020.
- Kullback-leibler upper confidence bounds for optimal sequential allocation. The Annals of Statistics, pages 1516–1541, 2013.
- Prediction, learning, and games. Cambridge university press, 2006.
- Combinatorial bandits. Journal of Computer and System Sciences, 78(5):1404–1422, 2012.
- Learning high-dimensional directed acyclic graphs with latent and selection variables. The Annals of Statistics, pages 294–321, 2012.
- Stochastic linear optimization under bandit feedback. COLT, pages 355–366, 2008.
- Action elimination and stopping conditions for the multi-armed bandit and reinforcement learning problems. Journal of machine learning research, 7(6), 2006.
- A general approach to causal mediation analysis. Psychological methods, 15(4):309, 2010.
- A trust-based consumer decision-making model in electronic commerce: The role of trust, perceived risk, and their antecedents. Decision support systems, 44(2):544–564, 2008.
- Cost-optimal learning of causal graphs. In International Conference on Machine Learning, pages 1875–1884. PMLR, 2017.
- Asymptotically efficient adaptive allocation rules. Advances in applied mathematics, 6(1):4–22, 1985.
- Causal bandits: Learning good interventions via causal inference. Advances in Neural Information Processing Systems, 29, 2016.
- Bandit algorithms. Cambridge University Press, 2020.
- Structural causal bandits: where to intervene? Advances in Neural Information Processing Systems, 31, 2018.
- Structural causal bandits with non-manipulable variables. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 4164–4172, 2019.
- Capturing heterogeneity in gene expression studies by surrogate variable analysis. PLoS genetics, 3(9):e161, 2007.
- Experimental design for cost-aware learning of causal graphs. Advances in Neural Information Processing Systems, 31, 2018.
- Regret analysis of bandit problems with causal background knowledge. In Conference on Uncertainty in Artificial Intelligence, pages 141–150. PMLR, 2020.
- Lipschitz bandits: Regret lower bound and optimal algorithms. In Conference on Learning Theory, pages 975–999. PMLR, 2014.
- A causal bandit approach to learning good atomic interventions in presence of unobserved confounders. In The 38th Conference on Uncertainty in Artificial Intelligence, 2022.
- Methods for causal inference from gene perturbation experiments and validation. Proceedings of the National Academy of Sciences, 113(27):7361–7368, 2016.
- Budgeted and non-budgeted causal bandits. In International Conference on Artificial Intelligence and Statistics, pages 2017–2025. PMLR, 2021.
- Judea Pearl. Causal diagrams for empirical research. Biometrika, 82(4):669–688, 1995.
- Judea Pearl. Causality. Cambridge university press, 2009.
- On the application of probability theory to agricultural experiments. essay on principles. section 9. Statistical Science, pages 465–472, 1990.
- William R Thompson. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 25(3-4):285–294, 1933.
- On the testable implications of causal models with hidden variables. In Proceedings of the Eighteenth Conference on Uncertainty in Artificial Intelligence, UAI’02, page 519–527, San Francisco, CA, USA, 2002a. Morgan Kaufmann Publishers Inc. ISBN 1558608974.
- A general identification condition for causal effects. eScholarship, University of California, 2002b.
- Knapsack based optimal policies for budget–limited multi–armed bandits. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 26, pages 1134–1140, 2012.
- Causal networks: Semantics and expressiveness. In Proceedings of the Fourth Annual Conference on Uncertainty in Artificial Intelligence, UAI ’88, page 69–78, NLD, 1990. North-Holland Publishing Co. ISBN 0444886508.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.