Data-Efficient Prediction-Powered Calibration via Cross-Validation

Published 27 Jul 2025 in cs.LG, eess.SP, and stat.ML | (2507.20268v1)

Abstract: Calibration data are necessary to formally quantify the uncertainty of the decisions produced by an existing AI model. To overcome the common issue of scarce calibration data, a promising approach is to employ synthetic labels produced by a (generally different) predictive model. However, fine-tuning the label-generating predictor on the inference task of interest, as well as estimating the residual bias of the synthetic labels, demand additional data, potentially exacerbating the calibration data scarcity problem. This paper introduces a novel approach that efficiently utilizes limited calibration data to simultaneously fine-tune a predictor and estimate the bias of the synthetic labels. The proposed method yields prediction sets with rigorous coverage guarantees for AI-generated decisions. Experimental results on an indoor localization problem validate the effectiveness and performance gains of our solution.