Retiring Adult: New Datasets for Fair Machine Learning

Published 10 Aug 2021 in cs.LG and stat.ML | (2108.04884v3)

Abstract: Although the fairness community has recognized the importance of data, researchers in the area primarily rely on UCI Adult when it comes to tabular data. Derived from a 1994 US Census survey, this dataset has appeared in hundreds of research papers where it served as the basis for the development and comparison of many algorithmic fairness interventions. We reconstruct a superset of the UCI Adult data from available US Census sources and reveal idiosyncrasies of the UCI Adult dataset that limit its external validity. Our primary contribution is a suite of new datasets derived from US Census surveys that extend the existing data ecosystem for research on fair machine learning. We create prediction tasks relating to income, employment, health, transportation, and housing. The data span multiple years and all states of the United States, allowing researchers to study temporal shift and geographic variation. We highlight a broad initial sweep of new empirical insights relating to trade-offs between fairness criteria, performance of algorithmic interventions, and the role of distribution shift based on our new datasets. Our findings inform ongoing debates, challenge some existing narratives, and point to future research directions. Our datasets are available at https://github.com/zykls/folktables.

Abstract PDF Upgrade to Chat

Citations (382)

View on Semantic Scholar

Summary

The paper introduces new datasets that expand beyond the UCI Adult dataset to improve fairness intervention assessments.
It reveals that fairness metrics and intervention effectiveness vary significantly with geographic and temporal distribution shifts.
The research offers a Python package for streamlined access to comprehensive US Census data, facilitating robust fair ML studies.

Retiring Adult: New Datasets for Fair Machine Learning

The paper "Retiring Adult: New Datasets for Fair Machine Learning" addresses a significant limitation in the field of algorithmic fairness research: the over-reliance on a small subset of datasets, particularly the UCI Adult dataset, for evaluating fair machine learning methods. This reliance on the UCI Adult dataset—a tabular dataset derived from the 1994 US Census—has potentially restricted the external validity and generalizability of fairness interventions. In response, the authors reconstruct a superset of the UCI Adult dataset and propose a suite of new datasets derived from comprehensive US Census sources designed to extend the empirical foundation in algorithmic fairness research.

Examination of UCI Adult Dataset

The paper begins with a critical analysis of the UCI Adult dataset. Despite its widespread use, this dataset has intrinsic limitations. A noteworthy issue is the binary target variable: whether an individual's income exceeds $50,000. This threshold is problematic since it corresponds to significantly different income quantiles across demographic groups, particularly disadvantaging recommender systems on Black and female populations, where it represents the 88th and 89th quantiles, respectively. The sensitivity of fairness criteria and intervention effectiveness to this threshold highlights the dataset's constraints. The authors show that factors like the choice of income threshold crucially affect the magnitude of fairness violations and intervention effectiveness.

New Dataset Contributions

To mitigate these issues, the paper presents new datasets derived from two US Census products: the Public Use Microdata Sample from the American Community Survey and the Annual Social and Economic Supplement of the Current Population Survey. These datasets cover broad geographic and temporal scales, facilitating the study of distribution shifts. They include features related to income, employment, health, transportation, and housing, offering a more diverse and robust foundation for algorithmic fairness research. The datasets are accompanied by a Python package, folktables, enabling easy access and manipulation for research purposes.

Empirical Insights and Observations

Several substantial observations emerge from the empirical investigations using these datasets:

Variation Across Populations: Fairness criteria and intervention effectiveness show considerable variation by state and across time. Training on one state but testing on another frequently leads to unpredictable outcomes, with both accuracy and fairness metrics diverging.
Locus of Intervention: Fairness interventions exhibit different results depending on whether applied at a national or state-specific level, highlighting the need for context-aware approaches and guidance for practitioners.
No Assured Progress with Increased Data: Contrary to expectations, increasing dataset size or the passage of time, which often reduces disparities in cognitive tasks, does not have the same effect in tabular data scenarios. Persistent social inequalities reflected in demographic data mean that dataset design choices play a crucial role in distributing error burdens.

Implications and Future Directions

The availability of these new datasets and initial empirical findings have several implications for future research and practical applications:

The datasets provide a means to reevaluate the empirical foundations of algorithmic fairness interventions, thereby prompting a broader range of empirical studies.
The observations challenge existing assumptions in the field, especially regarding the transferability and context-dependence of fairness metrics and interventions.
Insights could inform policymakers on potential variability in algorithmic impacts across locations and contexts, motivating region-specific interventions.

Looking ahead, further work should explore methodological advancements that can navigate geographic and temporal distribution shifts effectively. Additionally, the datasets serve as a fertile ground for developing causal inference methods and exploring fairness under varying societal dynamics. This contribution not only tackles current empirical limitations but lays the groundwork for a more nuanced and comprehensive exploration of algorithmic fairness in machine learning.

Markdown Report Issue