Papers
Topics
Authors
Recent
Search
2000 character limit reached

Multi-group Learning for Hierarchical Groups

Published 1 Feb 2024 in cs.LG | (2402.00258v3)

Abstract: The multi-group learning model formalizes the learning scenario in which a single predictor must generalize well on multiple, possibly overlapping subgroups of interest. We extend the study of multi-group learning to the natural case where the groups are hierarchically structured. We design an algorithm for this setting that outputs an interpretable and deterministic decision tree predictor with near-optimal sample complexity. We then conduct an empirical evaluation of our algorithm and find that it achieves attractive generalization properties on real datasets with hierarchical group structure.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (38)
  1. An adaptive nearest neighbor rule for classification. In Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019. URL https://papers.nips.cc/paper_files/paper/2019/hash/a6a767bbb2e3513233f942e0ff24272c-Abstract.html.
  2. (De)Constructing Bias on Skin Lesion Datasets, April 2019. URL http://arxiv.org/abs/1904.08818. arXiv:1904.08818 [cs].
  3. Advancing subgroup fairness via sleeping experts, December 2019. URL http://arxiv.org/abs/1909.08375. arXiv:1909.08375 [cs, stat].
  4. Collaborative PAC Learning. In Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017. URL https://papers.nips.cc/paper_files/paper/2017/hash/186a157b2992e7daed3677ce8e9fe40f-Abstract.html.
  5. Nuanced Metrics for Measuring Unintended Bias with Real Data for Text Classification, May 2019. URL http://arxiv.org/abs/1903.04561. arXiv:1903.04561 [cs, stat].
  6. Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification. In Proceedings of the 1st Conference on Fairness, Accountability and Transparency, pp.  77–91. PMLR, January 2018. URL https://proceedings.mlr.press/v81/buolamwini18a.html. ISSN: 2640-3498.
  7. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp.  785–794, August 2016. doi: 10.1145/2939672.2939785. URL http://arxiv.org/abs/1603.02754. arXiv:1603.02754 [cs].
  8. Automated Data Slicing for Model Validation:A Big data - AI Integration Approach, January 2019. URL http://arxiv.org/abs/1807.06068. arXiv:1807.06068 [cs].
  9. AI for radiographic COVID-19 detection selects shortcuts over signal. Nature Machine Intelligence, 3(7):610–619, July 2021. ISSN 2522-5839. doi: 10.1038/s42256-021-00338-7. URL https://www.nature.com/articles/s42256-021-00338-7. Number: 7 Publisher: Nature Publishing Group.
  10. Does Object Recognition Work for Everyone?, June 2019. URL http://arxiv.org/abs/1906.02659. arXiv:1906.02659 [cs].
  11. Minimax Group Fairness: Algorithms and Experiments, March 2021. URL http://arxiv.org/abs/2011.03108. arXiv:2011.03108 [cs].
  12. Retiring Adult: New Datasets for Fair Machine Learning, January 2022. URL http://arxiv.org/abs/2108.04884. arXiv:2108.04884 [cs, stat].
  13. Decoupled classifiers for fair and efficient machine learning, July 2017. URL http://arxiv.org/abs/1707.06613. arXiv:1707.06613 [cs].
  14. Outcome Indistinguishability, November 2020. URL http://arxiv.org/abs/2011.13426. arXiv:2011.13426 [cs].
  15. Domino: Discovering Systematic Errors with Cross-Modal Embeddings, May 2022. URL http://arxiv.org/abs/2203.14960. arXiv:2203.14960 [cs].
  16. Subgroup Robustness Grows On Trees: An Empirical Baseline Investigation, April 2023. URL http://arxiv.org/abs/2211.12703. arXiv:2211.12703 [cs].
  17. An Algorithmic Framework for Bias Bounties, May 2022. URL http://arxiv.org/abs/2201.10408. arXiv:2201.10408 [cs].
  18. Multicalibration as Boosting for Regression, January 2023. URL http://arxiv.org/abs/2301.13767. arXiv:2301.13767 [cs].
  19. On-Demand Sampling: Learning Optimally from Multiple Distributions. Advances in Neural Information Processing Systems, 35:406–419, December 2022. URL https://proceedings.neurips.cc/paper_files/paper/2022/hash/02917acec264a52a729b99d9bc857909-Abstract-Conference.html.
  20. Equality of Opportunity in Supervised Learning, October 2016. URL http://arxiv.org/abs/1610.02413. arXiv:1610.02413 [cs].
  21. Multicalibration: Calibration for the (Computationally-Identifiable) Masses. In Proceedings of the 35th International Conference on Machine Learning, pp.  1939–1948. PMLR, July 2018. URL https://proceedings.mlr.press/v80/hebert-johnson18a.html. ISSN: 2640-3498.
  22. Improving fairness in machine learning systems: What do industry practitioners need? In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, pp.  1–16, May 2019. doi: 10.1145/3290605.3300830. URL http://arxiv.org/abs/1812.05239. arXiv:1812.05239 [cs].
  23. Preventing Fairness Gerrymandering: Auditing and Learning for Subgroup Fairness. In Proceedings of the 35th International Conference on Machine Learning, pp.  2564–2572. PMLR, July 2018. URL https://proceedings.mlr.press/v80/kearns18a.html. ISSN: 2640-3498.
  24. Multiaccuracy: Black-Box Post-Processing for Fairness in Classification. In Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society, AIES ’19, pp.  247–254, New York, NY, USA, January 2019. Association for Computing Machinery. ISBN 978-1-4503-6324-2. doi: 10.1145/3306618.3314287. URL https://dl.acm.org/doi/10.1145/3306618.3314287.
  25. Gender-From-Iris or Gender-From-Mascara?, February 2017. URL http://arxiv.org/abs/1702.01304. arXiv:1702.01304 [cs].
  26. Fairness without Demographics through Adversarially Reweighted Learning. In Advances in Neural Information Processing Systems, volume 33, pp.  728–740. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper/2020/hash/07fc15c9d169ee48573edd749d25945d-Abstract.html.
  27. Deep Learning Face Attributes in the Wild, September 2015. URL http://arxiv.org/abs/1411.7766. arXiv:1411.7766 [cs].
  28. Hidden Stratification Causes Clinically Meaningful Failures in Machine Learning for Medical Imaging, November 2019. URL http://arxiv.org/abs/1909.12475. arXiv:1909.12475 [cs, stat].
  29. Bootleg: Chasing the Tail with Self-Supervised Named Entity Disambiguation, October 2020. URL http://arxiv.org/abs/2010.10363. arXiv:2010.10363 [cs].
  30. A comparison of approaches to improve worst-case predictive model performance over patient subpopulations. Scientific Reports, 12(1):3254, February 2022. ISSN 2045-2322. doi: 10.1038/s41598-022-07167-7. URL https://www.nature.com/articles/s41598-022-07167-7. Number: 1 Publisher: Nature Publishing Group.
  31. Agnostic Multi-Group Active Learning, June 2023. URL http://arxiv.org/abs/2306.01922. arXiv:2306.01922 [cs].
  32. Multi-group Agnostic PAC Learnability, May 2021. URL http://arxiv.org/abs/2105.09989. arXiv:2105.09989 [cs].
  33. No Subclass Left Behind: Fine-Grained Robustness in Coarse-Grained Classification Problems, April 2022. URL http://arxiv.org/abs/2011.12945. arXiv:2011.12945 [cs].
  34. Simple and near-optimal algorithms for hidden stratification and multi-group learning. In Proceedings of the 39th International Conference on Machine Learning, pp.  21633–21657. PMLR, June 2022. URL https://proceedings.mlr.press/v162/tosh22a.html. ISSN: 2640-3498.
  35. U.S. Census. U.S. Census Bureau Regions and Divisions of the United States, December 2023. URL https://www2.census.gov/geo/pdfs/maps-data/maps/reference/us_regdiv.pdf.
  36. Valiant, L. G. A theory of the learnable. Communications of the ACM, 27(11):1134–1142, November 1984. ISSN 0001-0782. doi: 10.1145/1968.1972. URL https://dl.acm.org/doi/10.1145/1968.1972.
  37. Cross-Domain Data Integration for Named Entity Disambiguation in Biomedical Text. In Moens, M.-F., Huang, X., Specia, L., and Yih, S. W.-t. (eds.), Findings of the Association for Computational Linguistics: EMNLP 2021, pp.  4566–4575, Punta Cana, Dominican Republic, November 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.findings-emnlp.388. URL https://aclanthology.org/2021.findings-emnlp.388.
  38. Fairer machine learning in the real world: Mitigating discrimination without collecting sensitive data. Big Data & Society, 4(2):2053951717743530, December 2017. ISSN 2053-9517. doi: 10.1177/2053951717743530. URL https://doi.org/10.1177/2053951717743530. Publisher: SAGE Publications Ltd.
Citations (1)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (2)

Collections

Sign up for free to add this paper to one or more collections.