Randomness, exchangeability, and conformal prediction

Published 20 Jan 2025 in cs.LG, math.ST, stat.ML, and stat.TH | (2501.11689v3)

Abstract: This paper argues for a wider use of the functional theory of randomness, a modification of the algorithmic theory of randomness getting rid of unspecified additive constants. Both theories are useful for understanding relationships between the assumptions of IID data and data exchangeability. While the assumption of IID data is standard in machine learning, conformal prediction relies on data exchangeability. Nouretdinov, V'yugin, and Gammerman showed, using the language of the algorithmic theory of randomness, that conformal prediction is a universal method under the assumption of IID data. In this paper (written for the Alex Gammerman Festschrift) I will selectively review connections between exchangeability and the property of being IID, early history of conformal prediction, my encounters and collaboration with Alex and other interesting people, and a translation of Nouretdinov et al.'s results into the language of the functional theory of randomness, which moves it closer to practice. Namely, the translation says that every confidence predictor that is valid for IID data can be transformed to a conformal predictor without losing much in predictive efficiency.

Abstract PDF Upgrade to Chat

Summary

The paper establishes a novel functional theory of randomness that removes arbitrary additive constants from algorithmic models.
It develops a taxonomy of eight confidence predictors to differentiate between randomness and exchangeability in finite data sequences.
Quantitative bounds and calibration techniques are introduced, linking p-values and e-values to enhance conformal prediction practices.

Insights on Randomness, Exchangeability, and Conformal Prediction

The paper, "Randomness, Exchangeability, and Conformal Prediction" by Vladimir Vovk, offers an in-depth exploration into the functional theory of randomness. This work extends previous results in algorithmic theories of randomness by eliminating the reliance on unspecified additive constants. The paper introduces novel frameworks for confidence predictors, a concept integral to both statistical inference and machine learning. Specifically, it examines randomness predictors and exchangeability predictors, quantifying their deviations from conformal predictors and thus contributing valuable insights into the predictive mechanisms under different assumptions of data generation.

Foundations and Theoretical Implications

The basis of the discussion lies in the functional theory of randomness, in contrast to its algorithmic counterpart, pioneered by Kolmogorov. The latter is criticized for its dependency on arbitrary universal constants, thus affecting its practical utility. The functional theory aims to mitigate these drawbacks by representing randomness through functional relationships between distinct classes, moving closer to practical utility in machine learning and statistical applications.

The concepts of randomness and exchangeability are central to the paper. Randomness assumes IID (independent and identically distributed) sequences, a typical assumption in machine learning, while exchangeability is weaker, allowing permutations of sequences. For infinite sequences, de Finetti's theorem equates these assumptions, yet the difference becomes salient in finite cases, providing a rich field for analysis.

Predictor Taxonomy and Practical Differentiation

The paper identifies a taxonomy of eight confidence predictors, categorized by assumptions of randomness versus exchangeability, the use of p-values or e-values, and optional permutation invariance. Conformal predictors fit within this taxonomy as exchangeability p-variable predictors that are permutation invariant.

The distinction between randomness and exchangeability predictors is elucidated, especially through finite sequences. The paper highlights Kolmogorov’s step, an essential part of the taxonomy that reconciles randomness to exchangeability, and introduces configuration randomness—randomness that accounts for permutation invariant multiset configurations of a sequence.

Quantitative Approaches and Novel Results

The paper establishes rigorous quantitative bounds between randomness and exchangeability predictors. The main theorem, for instance, demonstrates that every false label excluded by a randomness predictor can similarly be excluded by a conformal predictor unless the data sequence is nonrandom, emphasizing the broad applicability and robustness of conformal prediction.

Notably, the paper explores calibration techniques transforming between p-values and e-values, and vice versa, addressing their functional equivalence and suggesting optimal functional transformations. The empirical substantiation of these results offers a promise for practical adaptations in predictive analytics.

Future Directions and Implications

Potential research directions include an exploration of the optimality of constants such as the Euler's constant factor in specific theorem proofs. Moreover, refining the optimal transformation pathways in the taxonomy's red path provides a rich avenue for future investigation.

In conclusion, the paper offers a comprehensive examination of predictors grounded in both randomness and exchangeability. It asserts the promise of enhancing predictive accuracy and reliability, especially in domains constrained by the assumptions of finite-data sequences. This work marks a significant theoretical advance in the intersection of statistics and machine learning, with implications that extend into algorithmic advancements and practical data science applications.