Efficient estimation with incomplete data via generalised ANOVA decompositions
Abstract: We study the semiparametric efficient estimation of a class of linear functionals in settings where a complete multivariate dataset is supplemented by additional datasets recording subsets of the variables of interest. These datasets are allowed to have a general, in particular non-monotonic, structure. Our main contribution is to characterise the asymptotic minimal mean squared error for these problems and to introduce an estimator whose risk approximately matches this lower bound. We show that the efficient rescaled variance can be expressed as the minimal value of a quadratic optimisation problem over a function space, thus establishing a fundamental link between these estimation problems and the theory of generalised ANOVA decompositions. Our estimation procedure uses iterated nonparametric regression to mimic an approximate influence function derived through gradient descent. We prove that this estimator is approximately normally distributed, provide an estimator of its variance and thus develop confidence intervals of asymptotically minimal width. Finally we present extensions of our theory demonstrating that the framework can be adapted to include various types of sampling bias and non-linear functionals.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.