Cost-Sensitive Learning to Defer to Multiple Experts with Workload Constraints
Abstract: Learning to defer (L2D) aims to improve human-AI collaboration systems by learning how to defer decisions to humans when they are more likely to be correct than an ML classifier. Existing research in L2D overlooks key real-world aspects that impede its practical adoption, namely: i) neglecting cost-sensitive scenarios, where type I and type II errors have different costs; ii) requiring concurrent human predictions for every instance of the training dataset; and iii) not dealing with human work-capacity constraints. To address these issues, we propose the \textit{deferral under cost and capacity constraints framework} (DeCCaF). DeCCaF is a novel L2D approach, employing supervised learning to model the probability of human error under less restrictive data requirements (only one expert prediction per instance) and using constraint programming to globally minimize the error cost, subject to workload limitations. We test DeCCaF in a series of cost-sensitive fraud detection scenarios with different teams of 9 synthetic fraud analysts, with individual work-capacity constraints. The results demonstrate that our approach performs significantly better than the baselines in a wide array of scenarios, achieving an average $8.4\%$ reduction in the misclassification cost. The code used for the experiments is available at https://github.com/feedzai/deccaf
- Optuna: A Next-generation Hyperparameter Optimization Framework. In A. Teredesai, V. Kumar, Y. Li, R. Rosales, E. Terzi, and G. Karypis, editors, Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD 2019, Anchorage, AK, USA, August 4-8, 2019, pages 2623–2631. ACM, 2019. doi: 10.1145/3292500.3330701.
- S. Alon-Barkat and M. Busuioc. Human–ai interactions in public sector decision making:“automation bias” and “selective adherence” to algorithmic advice. Journal of Public Administration Research and Theory, 33(1):153–169, 2023.
- On the importance of application-grounded experimental design for evaluating explainable ml methods. arXiv preprint arXiv:2206.13503, 2022.
- Confidence scores make instance-dependent label-noise learning possible. In International conference on machine learning, pages 825–836. PMLR, 2021.
- Deep neural networks and tabular data: A survey. IEEE Transactions on Neural Networks and Learning Systems, 2022.
- Loss functions for binary class probability estimation and classification: Structure and applications. Working draft, November, 3:13, 2005.
- 2.1 the bisection algorithm. Numerical analysis, 3, 1985.
- Sample Efficient Learning of Predictors that Complement Humans. In K. Chaudhuri, S. Jegelka, L. Song, C. Szepesvári, G. Niu, and S. Sabato, editors, International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA, volume 162 of Proceedings of Machine Learning Research, pages 2972–3005. PMLR, 2022.
- C. K. Chow. On optimum recognition error and reject tradeoff. IEEE Trans. Inf. Theory, 16(1):41–46, 1970. doi: 10.1109/TIT.1970.1054406.
- Learning with Rejection. In R. Ortner, H. U. Simon, and S. Zilles, editors, Algorithmic Learning Theory - 27th International Conference, ALT 2016, Bari, Italy, October 19-21, 2016, Proceedings, volume 9925 of Lecture Notes in Computer Science, pages 67–82, 2016. doi: 10.1007/978-3-319-46379-7_5.
- Extraneous factors in judicial decisions. Proceedings of the National Academy of Sciences, 108(17):6889–6892, 2011.
- A Case for Humans-in-the-Loop: Decisions in the Presence of Erroneous Algorithmic Scores. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems, pages 1–12, Honolulu HI USA, Apr. 2020. ACM. ISBN 978-1-4503-6708-0. doi: 10.1145/3313831.3376638.
- Hybrid Intelligence. Business & Information Systems Engineering, 61(5):637–643, Oct. 2019. ISSN 2363-7005, 1867-0202. doi: 10.1007/s12599-019-00595-2.
- C. Elkan. The foundations of cost-sensitive learning. In International joint conference on artificial intelligence, volume 17, pages 973–978. Lawrence Erlbaum Associates Ltd, 2001.
- A survey on concept drift adaptation. ACM computing surveys (CSUR), 46(4):1–37, 2014.
- The accuracy, equity, and jurisprudence of criminal risk assessment. In Research handbook on big data law, pages 9–28. Edward Elgar Publishing, 2021.
- S. Grimstad and M. Jørgensen. Inconsistency of expert judgment-based estimates of software development effort. Journal of Systems and Software, 80(11):1770–1777, 2007.
- Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. Jama, 316(22):2402–2410, 2016.
- On calibration of modern neural networks. In International conference on machine learning, pages 1321–1330. PMLR, 2017.
- Artificial intelligence for anti-money laundering: a review and extension. Digital Finance, 2(3-4):211–239, 2020.
- Forming Effective Human-AI Teams: Building Machine Learning Models that Complement the Capabilities of Multiple Experts. In L. D. Raedt, editor, Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI 2022, Vienna, Austria, 23-29 July 2022, pages 2478–2484. ijcai.org, 2022. doi: 10.24963/ijcai.2022/344.
- Learning to defer with limited expert predictions. arXiv preprint arXiv:2304.07306, 2023.
- D. Hendrycks and K. Gimpel. A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, 2017.
- Advancing human-ai complementarity: The impact of user expertise and algorithmic tuning on joint decision making. arXiv preprint arXiv:2208.07960, 2022.
- How machine-learning recommendations influence clinician treatment selections: the example of antidepressant selection. Translational psychiatry, 11(1):108, 2021.
- Turning the Tables: Biased, Imbalanced, Dynamic Tabular Datasets for ML Evaluation. In Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, NeurIPS Datasets and Benchmarks 2022, 2022.
- LightGBM: A Highly Efficient Gradient Boosting Decision Tree. In I. Guyon, U. von Luxburg, S. Bengio, H. M. Wallach, R. Fergus, S. V. N. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, pages 3146–3154, 2017.
- M. Kearns. Efficient noise-tolerant learning from statistical queries. Journal of the ACM (JACM), 45(6):983–1006, 1998.
- Towards Unbiased and Accurate Deferral to Multiple Experts. In M. Fourcade, B. Kuipers, S. Lazar, and D. K. Mulligan, editors, AIES ’21: AAAI/ACM Conference on AI, Ethics, and Society, Virtual Event, USA, May 19-21, 2021, pages 154–165. ACM, 2021. doi: 10.1145/3461702.3462516.
- Learning multiple layers of features from tiny images. 2009.
- Human-ai collaboration in decision-making: beyond learning to defer. arXiv preprint arXiv:2206.13202, 2022.
- Assessing the impact of automated suggestions on decision making: Domain experts mediate model errors but take less initiative. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems, pages 1–13, 2021.
- Predict responsibly: improving fairness and accuracy by learning to defer. Advances in Neural Information Processing Systems, 31, 2018a.
- Predict Responsibly: Improving Fairness and Accuracy by Learning to Defer. In Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc., 2018b.
- Bayesian variable selection in linear regression. Journal of the american statistical association, 83(404):1023–1032, 1988.
- H. Mozannar and D. Sontag. Consistent estimators for learning to defer to an expert. In International Conference on Machine Learning, pages 7076–7087. PMLR, 2020a.
- H. Mozannar and D. A. Sontag. Consistent Estimators for Learning to Defer to an Expert. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event, volume 119 of Proceedings of Machine Learning Research, pages 7076–7087. PMLR, 2020b.
- Who should predict? exact algorithms for learning to defer to humans. In International Conference on Artificial Intelligence and Statistics, pages 10520–10545. PMLR, 2023.
- L. Perron and F. Didier. Cp-sat, 2023. URL https://developers.google.com/optimization/cp/cp_solver/.
- The Algorithmic Automation Problem: Prediction, Triage, and Human Effort. CoRR, abs/1903.12220, 2019a.
- Direct uncertainty prediction for medical second opinions. In International Conference on Machine Learning, pages 5281–5290. PMLR, 2019b.
- Composite binary losses. The Journal of Machine Learning Research, 11:2387–2422, 2010.
- Thresholding for Making Classifiers Cost-sensitive. In Proceedings, The Twenty-First National Conference on Artificial Intelligence and the Eighteenth Innovative Applications of Artificial Intelligence Conference, July 16-20, 2006, Boston, Massachusetts, USA, pages 476–481. AAAI Press, 2006.
- R. Shwartz-Ziv and A. Armon. Tabular data: Deep learning is not all you need. Information Fusion, 81:84–90, 2022.
- J. Steege. Leveraged calibrated loss for learning to defer. Master’s thesis, University of Twente, 2023.
- R. Verma and E. Nalisnick. Calibrated learning to defer with one-vs-all classifiers. In International Conference on Machine Learning, pages 22184–22202. PMLR, 2022a.
- R. Verma and E. T. Nalisnick. Calibrated Learning to Defer with One-vs-All Classifiers. In K. Chaudhuri, S. Jegelka, L. Song, C. Szepesvári, G. Niu, and S. Sabato, editors, International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA, volume 162 of Proceedings of Machine Learning Research, pages 22184–22202. PMLR, 2022b.
- Learning to defer to multiple experts: Consistent surrogate losses, confidence calibration, and conformal ensembles. In International Conference on Artificial Intelligence and Statistics, pages 11415–11434. PMLR, 2023.
- Cost-sensitive learning by cost-proportionate example weighting. In Third IEEE international conference on data mining, pages 435–442. IEEE, 2003.
- A second-order approach to learning with instance-dependent label noise. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10113–10123, 2021.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.