Quantifying the Uncertainty of Imputed Demographic Disparity Estimates: The Dual-Bootstrap
Abstract: Measuring average differences in an outcome across racial or ethnic groups is a crucial first step for equity assessments, but researchers often lack access to data on individuals' races and ethnicities to calculate them. A common solution is to impute the missing race or ethnicity labels using proxies, then use those imputations to estimate the disparity. Conventional standard errors mischaracterize the resulting estimate's uncertainty because they treat the imputation model as given and fixed, instead of as an unknown object that must be estimated with uncertainty. We propose a dual-bootstrap approach that explicitly accounts for measurement uncertainty and thus enables more accurate statistical inference, which we demonstrate via simulation. In addition, we adapt our approach to the commonly used Bayesian Improved Surname Geocoding (BISG) imputation algorithm, where direct bootstrapping is infeasible because the underlying Census Bureau data are unavailable. In simulations, we find that measurement uncertainty is generally insignificant for BISG except in particular circumstances; bias, not variance, is likely the predominant source of error. We apply our method to quantify the uncertainty of prevalence estimates of common health conditions by race using data from the American Family Cohort.
- The Effect of Age at School Entry on Educational Attainment: An Application of Instrumental Variables with Moments from Two Samples. Journal of the American statistical Association, 87(418):328–336.
- Split-Sample Instrumental Variables Estimates of the Return to Schooling. Journal of Business & Economic Statistics, 13(2):225–235.
- The Assessment Gap: Racial Inequalities in Property Taxation. The Quarterly Journal of Economics, 137(3):1383–1434.
- Racial Disparities in Surgery: A Cross-Specialty Matched Comparison Between Black and White Patients. Annals of Surgery Open, 1(2):e023.
- Berdejó, C. (2018). Criminalizing Race: Racial Disparities in Plea-Bargaining. Boston College Law Review, 59:1187–1249.
- Brown, D. A. (2022). The Whiteness of Wealth: How the Tax System Impoverishes Black Americans—And How We Can Fix It. Crown.
- Using Bayesian Imputation to Assess Racial and Ethnic Disparities in Pediatric Performance Measures. Health Services Research, 51(3):1095–1108.
- Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification. In Proceedings of the 1st Conference on Fairness, Accountability and Transparency, volume 81 of Proceedings of Machine Learning Research, pages 77–91. PMLR.
- Census Bureau (2022). Documentation for the 2017-2021 Variance Replicate Estimates Tables. Technical report.
- Fairness Under Unawareness: Assessing Disparity When Protected Class Is Unobserved. In Proceedings of the Conference on Fairness, Accountability, and Transparency, FAT* ’19, page 339–348, New York, NY, USA. Association for Computing Machinery.
- How Redundant are Redundant Encodings? Blindness in the Wild and Racial Disparity when Race Is Unobserved. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’23, page 667–686, New York, NY, USA. Association for Computing Machinery.
- Constructing Confidence Intervals for BIFSG Disparity Estimates.
- Using the Census Bureau’s Surname List to Improve Estimates of Race/Ethnicity and Associated Disparities. Health Services and Outcomes Research Methodology, 9:69–83.
- Measuring and Mitigating Racial Disparities in Tax Audits.
- Machine Learning Predictions as Regression Covariates. Political Analysis, 29(4):467–484.
- Fuller, W. A. (2009). Measurement Error Models. John Wiley & Sons.
- An Analysis of the New York City Police Department’s “Stop-and-Frisk” Policy in the Context of Claims of Racial Bias. Journal of the American Statistical Association, 102(479):813–823.
- Addressing Census Data Problems in Race Imputation via Fully Bayesian Improved Surname Geocoding and Name Supplements. Science Advances, 8(49):eadc9824.
- Assessing Algorithmic Fairness with Unobserved Protected Class Using Data Combination. Management Science, 68(3):1959–1981.
- RIDDLE: Race and Ethnicity Imputation from Disease History with Deep Learning. PLOS Computational Biology, 14(4):e1006106.
- A Scalable Bootstrap for Massive Data. Journal of the Royal Statistical Society Series B: Statistical Methodology, 76(4):795–816.
- Racial Disparities in Automated Speech Recognition. Proceedings of the National Academy of Sciences, 117(14):7684–7689.
- Kosorok, M. R. (2008). Introduction to Empirical Processes and Semiparametric Inference. Springer.
- Estimating the Unknown: Greater Racial and Ethnic Disparities in COVID-19 Burden After Accounting for Missing Race/Ethnicity Data. Epidemiology, 32(2):157–161.
- Racial and Ethnic Disparities in COVID-19-Related Infections, Hospitalizations, and Deaths: A Systematic Review. Annals of Internal Medicine, 174(3):362–373.
- Overlap, Matching, or Entropy Weights: What Are We Weighting For? Communications in Statistics – Simulation and Computation.
- Power of Tests for a Dichotomous Independent Variable Measured with Error. Health Services Research, 43(3):1085–1101.
- Worth Weighting? How to Think About and Use Weights in Survey Experiments. Political Analysis, 26(3):275–291.
- Murray, J. S. (2018). Multiple Imputation: A Review of Practical and Theoretical Findings. Statistical Science, 33(2):142–159.
- Ouimet, F. (2022). A Multivariate Normal Approximation for the Dirichlet Density and Some Applications. Stat, 11(1):e410.
- Owen, A. B. (2007). The Pigeonhole Bootstrap. Annals of Applied Statistics, 1(2):386–411.
- On Variance of the Treatment Effect in the Treated When Estimated by Inverse Probability Weighting. American Journal of Epidemiology, 191(6):1092–1097.
- Rubin, D. B. (1987). Multiple Imputation for Nonresponse in Surveys. John Wiley & Sons.
- The Calculus of M-Estimation in R with geex. Journal of Statistical Software, 92(2):1–15.
- Variance Estimation in Inverse Probability Weighted Cox Models. Biometrics, 77(3):1101–1117.
- The Calculus of M-Estimation. The American Statistician, 56(1):29–38.
- The American Family Cohort (v12.2). Redivis, Stanford, CA.
- Van der Vaart, A. W. (2000). Asymptotic Statistics. Cambridge University Press.
- Comparison of Imputation Methods for Race and Ethnic Information in Administrative Health Data. In 13th International Conference on Sampling Theory and Applications, pages 1–4, Bordeaux, France. IEEE.
- Implications of Missingness in Self-Reported Data for Estimating Racial and Ethnic Disparities in Medicaid Quality Measures. Health Services Research, 57(6):1370–1378.
- Zhang, Y. (2018). Assessing Fair Lending Risks Using Race/Ethnicity Proxies. Management Science, 64(1):178–197.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.