Implementing multiple imputation for missing data in longitudinal studies when models are not feasible: A tutorial on the random hot deck approach

Published 14 Apr 2020 in stat.ME | (2004.06630v4)

Abstract: Objective: Researchers often use model-based multiple imputation to handle missing at random data to minimize bias while making the best use of all available data. However, there are sometimes constraints within the data that make model-based imputation difficult and may result in implausible values. In these contexts, we describe how to use random hot deck imputation to allow for plausible multiple imputation in longitudinal studies. Study Design and Setting: We illustrate random hot deck multiple imputation using The Childhood Health, Activity, and Motor Performance School Study Denmark (CHAMPS-DK), a prospective cohort study that measured weekly sports participation for 1700 Danish schoolchildren. We matched records with missing data to several observed records, generated probabilities for matched records using observed data, and sampled from these records based on the probability of each occurring. Because imputed values are generated randomly, multiple complete datasets can be created and analyzed similar to model-based multiple imputation. Conclusion: Multiple imputation using random hot deck imputation is an alternative method when model-based approaches are infeasible, specifically where there are constraints within and between covariates.