Demystifying Statistical Matching Algorithms for Big Data

Published 11 Sep 2023 in stat.ME | (2309.05859v1)

Abstract: Statistical matching is an effective method for estimating causal effects in which treated units are paired with control units with similar'' values of confounding covariates prior to performing estimation. In this way, matching helps isolate the effect of treatment on response from effects due to the confounding covariates. While there are a large number of software packages to perform statistical matching, the algorithms and techniques used to solve statistical matching problems -- especially matching without replacement -- are not widely understood. In this paper, we describe in detail commonly-used algorithms and techniques for solving statistical matching problems. We focus in particular on the efficiency of these algorithms as the number of observations grow large. We advocate for the further development of statistical matching methods that impose and exploitsparsity'' -- by greatly restricting the available matches for a given treated unit -- as this may be critical to ensure scalability of matching methods as data sizes grow large.