- The paper introduces new datasets that expand beyond the UCI Adult dataset to improve fairness intervention assessments.
- It reveals that fairness metrics and intervention effectiveness vary significantly with geographic and temporal distribution shifts.
- The research offers a Python package for streamlined access to comprehensive US Census data, facilitating robust fair ML studies.
Retiring Adult: New Datasets for Fair Machine Learning
The paper "Retiring Adult: New Datasets for Fair Machine Learning" addresses a significant limitation in the field of algorithmic fairness research: the over-reliance on a small subset of datasets, particularly the UCI Adult dataset, for evaluating fair machine learning methods. This reliance on the UCI Adult dataset—a tabular dataset derived from the 1994 US Census—has potentially restricted the external validity and generalizability of fairness interventions. In response, the authors reconstruct a superset of the UCI Adult dataset and propose a suite of new datasets derived from comprehensive US Census sources designed to extend the empirical foundation in algorithmic fairness research.
Examination of UCI Adult Dataset
The paper begins with a critical analysis of the UCI Adult dataset. Despite its widespread use, this dataset has intrinsic limitations. A noteworthy issue is the binary target variable: whether an individual's income exceeds $50,000. This threshold is problematic since it corresponds to significantly different income quantiles across demographic groups, particularly disadvantaging recommender systems on Black and female populations, where it represents the 88th and 89th quantiles, respectively. The sensitivity of fairness criteria and intervention effectiveness to this threshold highlights the dataset's constraints. The authors show that factors like the choice of income threshold crucially affect the magnitude of fairness violations and intervention effectiveness.
New Dataset Contributions
To mitigate these issues, the paper presents new datasets derived from two US Census products: the Public Use Microdata Sample from the American Community Survey and the Annual Social and Economic Supplement of the Current Population Survey. These datasets cover broad geographic and temporal scales, facilitating the study of distribution shifts. They include features related to income, employment, health, transportation, and housing, offering a more diverse and robust foundation for algorithmic fairness research. The datasets are accompanied by a Python package, folktables, enabling easy access and manipulation for research purposes.
Empirical Insights and Observations
Several substantial observations emerge from the empirical investigations using these datasets:
- Variation Across Populations: Fairness criteria and intervention effectiveness show considerable variation by state and across time. Training on one state but testing on another frequently leads to unpredictable outcomes, with both accuracy and fairness metrics diverging.
- Locus of Intervention: Fairness interventions exhibit different results depending on whether applied at a national or state-specific level, highlighting the need for context-aware approaches and guidance for practitioners.
- No Assured Progress with Increased Data: Contrary to expectations, increasing dataset size or the passage of time, which often reduces disparities in cognitive tasks, does not have the same effect in tabular data scenarios. Persistent social inequalities reflected in demographic data mean that dataset design choices play a crucial role in distributing error burdens.
Implications and Future Directions
The availability of these new datasets and initial empirical findings have several implications for future research and practical applications:
- The datasets provide a means to reevaluate the empirical foundations of algorithmic fairness interventions, thereby prompting a broader range of empirical studies.
- The observations challenge existing assumptions in the field, especially regarding the transferability and context-dependence of fairness metrics and interventions.
- Insights could inform policymakers on potential variability in algorithmic impacts across locations and contexts, motivating region-specific interventions.
Looking ahead, further work should explore methodological advancements that can navigate geographic and temporal distribution shifts effectively. Additionally, the datasets serve as a fertile ground for developing causal inference methods and exploring fairness under varying societal dynamics. This contribution not only tackles current empirical limitations but lays the groundwork for a more nuanced and comprehensive exploration of algorithmic fairness in machine learning.