M$^3$TN: Multi-gate Mixture-of-Experts based Multi-valued Treatment Network for Uplift Modeling
Abstract: Uplift modeling is a technique used to predict the effect of a treatment (e.g., discounts) on an individual's response. Although several methods have been proposed for multi-valued treatment, they are extended from binary treatment methods. There are still some limitations. Firstly, existing methods calculate uplift based on predicted responses, which may not guarantee a consistent uplift distribution between treatment and control groups. Moreover, this may cause cumulative errors for multi-valued treatment. Secondly, the model parameters become numerous with many prediction heads, leading to reduced efficiency. To address these issues, we propose a novel \underline{M}ulti-gate \underline{M}ixture-of-Experts based \underline{M}ulti-valued \underline{T}reatment \underline{N}etwork (M$3$TN). M$3$TN consists of two components: 1) a feature representation module with Multi-gate Mixture-of-Experts to improve the efficiency; 2) a reparameterization module by modeling uplift explicitly to improve the effectiveness. We also conduct extensive experiments to demonstrate the effectiveness and efficiency of our M$3$TN.
- “Explicit feature interaction-aware uplift network for online marketing,” arXiv preprint arXiv:2306.00315, 2023.
- “A unified survey of treatment effect heterogeneity modelling and uplift modelling,” ACM Computing Surveys (CSUR), vol. 54, no. 8, pp. 1–36, 2021.
- “Offline imitation learning with variational counterfactual reasoning,” in Thirty-seventh Conference on Neural Information Processing Systems, 2023.
- “Uplift modeling for location-based online advertising,” in Proceedings of the 3rd ACM SIGSPATIAL international workshop on location-based recommendations, geosocial networks and geoadvertising, 2019, pp. 1–4.
- “Robustness-enhanced uplift modeling with adversarial feature desensitization,” arXiv preprint arXiv:2310.04693, 2023.
- “Metalearners for estimating heterogeneous treatment effects using machine learning,” Proceedings of the national academy of sciences, vol. 116, no. 10, pp. 4156–4165, 2019.
- “Quasi-oracle estimation of heterogeneous treatment effects,” Biometrika, vol. 108, no. 2, pp. 299–319, 2021.
- “Decision trees for uplift modeling,” in 2010 IEEE International Conference on Data Mining. IEEE, 2010, pp. 441–450.
- “Uplift modeling with multiple treatments and general response types,” in Proceedings of the 2017 SIAM International Conference on Data Mining. SIAM, 2017, pp. 588–596.
- “Estimation and inference of heterogeneous treatment effects using random forests,” Journal of the American Statistical Association, vol. 113, no. 523, pp. 1228–1242, 2018.
- Michael Lechner, “Modified causal forests for estimating heterogeneous causal effects,” arXiv preprint arXiv:1812.09487, 2018.
- “Gcf: Generalized causal forest for heterogeneous treatment effect estimation in online marketplace,” arXiv preprint arXiv:2203.10975, 2022.
- “Estimating individual treatment effect: generalization bounds and algorithms,” in International conference on machine learning. PMLR, 2017, pp. 3076–3085.
- “Adapting neural networks for the estimation of treatment effects,” Advances in neural information processing systems, vol. 32, 2019.
- “Memento: Neural model for estimating individual treatment effects for multiple treatments,” in Proceedings of the 31st ACM International Conference on Information & Knowledge Management, 2022, pp. 3381–3390.
- “Hydranet: A neural network for the estimation of multi-valued treatment effects,” in NeurIPS 2022 Workshop on Causality for Real-world Impact, 2022.
- Donald B Rubin, “Causal inference using potential outcomes: Design, modeling, decisions,” Journal of the American Statistical Association, vol. 100, no. 469, pp. 322–331, 2005.
- “Learning factored representations in a deep mixture of experts,” arXiv preprint arXiv:1312.4314, 2013.
- “Outrageously large neural networks: The sparsely-gated mixture-of-experts layer,” arXiv preprint arXiv:1701.06538, 2017.
- “The costs of low birth weight,” The Quarterly Journal of Economics, vol. 120, no. 3, pp. 1031–1083, 2005.
- “Qini-based uplift regression,” The Annals of Applied Statistics, vol. 15, no. 3, pp. 1247–1272, 2021.
- “Optuna: A next-generation hyperparameter optimization framework,” in Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, 2019, pp. 2623–2631.
- SS Vallender, “Calculation of the wasserstein distance between probability distributions on the line,” Theory of Probability & Its Applications, vol. 18, no. 4, pp. 784–786, 1974.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.