A Framework for Statistical Inference via Randomized Algorithms

Published 20 Jul 2023 in stat.ME, math.ST, stat.CO, and stat.TH | (2307.11255v5)

Abstract: Randomized algorithms, such as randomized sketching or stochastic optimization, are a promising approach to ease the computational burden in analyzing large datasets. However, randomized algorithms also produce non-deterministic outputs, leading to the problem of evaluating their accuracy. In this paper, we develop a statistical inference framework for quantifying the uncertainty of the outputs of randomized algorithms. Our key conclusion is that one can perform statistical inference for the target of a sequence of randomized algorithms as long as in the limit, their outputs fluctuate around the target according to any (possibly unknown) probability distribution. In this setting, we develop appropriate statistical inference methods -- sub-randomization, multi-run plug-in and multi-run aggregation -- by estimating the unknown parameters of the limiting distribution either using multiple runs of the randomized algorithm, or by tailored estimates. As illustrations, we develop methods for statistical inference when using stochastic optimization (such as Polyak-Ruppert averaging in stochastic gradient descent and stochastic optimization with momentum). We also illustrate our methods in inference for least squares parameters via randomized sketching, by characterizing the limiting distributions of sketching estimates in a possibly growing dimensional case. We further characterize the computation and communication cost of our methods, showing that in certain cases, they add negligible overhead. The results are supported via a broad range of simulations.