Empirical evaluation of uncertainty quantification methods in supervised learning

Develop empirical evaluation methodologies for assessing methods that quantify aleatoric, epistemic, and total predictive uncertainty in supervised learning, despite the absence of ground-truth uncertainty labels in typical datasets.

Background

The paper notes that, unlike target variables, datasets typically do not provide ground-truth uncertainty labels, which makes it difficult to directly assess the correctness of predicted aleatoric, epistemic, or total uncertainty. Consequently, most current evaluations are indirect, for example by measuring the usefulness of uncertainty estimates for improved prediction and decision-making (e.g., through set-valued prediction).

The authors mention accuracy–rejection curves as one such indirect assessment, where a classifier’s accuracy is plotted against the fraction of abstentions. However, they highlight that a principled, general methodology for empirical evaluation of uncertainty quantification methods remains to be established.

References

In addition to theoretical problems of this kind, there are also many open practical questions. This includes, for example, the question of how to perform an empirical evaluation {empirical evaluation} of methods for quantifying uncertainty, whether aleatoric, epistemic, or total.

Aleatoric and Epistemic Uncertainty in Machine Learning: An Introduction to Concepts and Methods  (1910.09457 - Hüllermeier et al., 2019) in Discussion and conclusion (Section 5)