Conformal Test Martingales (CTMs)
- CTMs are methods within the conformal prediction framework that transform conformity scores into p-values and martingale processes to evaluate exchangeability in data streams.
- They leverage the uniform distribution of p-values under exchangeability and employ Ville’s inequality to assess significant deviations from expected behavior.
- CTMs may fail to detect A-cryptic change-points when distributional shifts leave the conformity score law invariant, highlighting inherent limitations in their detection power.
Conformal Test Martingales (CTMs) are a foundational methodology within the conformal prediction framework for sequential testing of the exchangeability assumption in data streams. They operationalize exchangeability validation by converting sequences of conformity scores—derived from domain-specific or oracle-based measures—into p-values, and subsequently into stochastic processes (martingales) that serve as detectors of departures from the uniformity enjoined by exchangeability. Recent research has rigorously elucidated both the strengths and intrinsic limitations of CTMs, uncovering classes of distributional shift that are fully cryptic to this methodology even for oracle measures, and thus establishing a precise boundary on their statistical power (Szabadváry, 3 Jan 2026).
1. Formal Definition and Theoretical Basis
A CTM is defined relative to a sequence of data points and a conformity (or nonconformity) measure selected by the user. The measure maps each new point and the historical batch to a real-valued score, typically representing the “typicality” or density (oracle case) of the data conditional on the sequence so far. Each data point is mapped by to a “smoothed” conformal p-value , which, under exchangeability of yields an IID sequence uniformly distributed on .
A conformal test martingale is any nonnegative martingale adapted to the filtration generated by the p-value sequence, initialized at . That is,
for some score function , typically chosen to upweight small (unlikely) p-values. Key properties include:
- Under exchangeability, remains a true martingale and does not systematically grow.
- By Ville’s inequality, under exchangeability. Thus, an observed greatly exceeding $1$ is strong evidence against exchangeability (Szabadváry, 3 Jan 2026).
2. Exchangeability, Uniformity, and the Limitations of the Converse
Exchangeability of the data sequence is not equivalent to the uniformity of conformal p-values, but rather strictly stronger. The classical conformal validity theorem asserts that exchangeability implies IID uniformity of the p-values: However, the converse fails: one can construct sequences where exchangeability is violated, yet the resulting sequence of p-values—hence the CTM process—continues to behave as if no change occurred. This realization is termed conformal blindness or -crypticity, highlighting that CTMs fundamentally test only for changes that affect the p-value distribution (Szabadváry, 3 Jan 2026).
3. -Cryptic Change-Points and Their Explicit Construction
A central advance is the formalization of -cryptic change-points. For two distributions and a given conformity measure , the pair is -cryptic if, under a change from to , the conformity score distribution (and hence the p-value law) is invariant: This condition is sufficient to guarantee that all conformal p-values, and thus any CTM, are blind to the change. The phenomenon can be realized concretely even for oracle conformity measures: e.g., in the bivariate Gaussian setup with and , if the shift of occurs precisely along the slope determined by the covariance matrix, the conditional score is identically distributed before and after the change. As a result, even a drastic change in distribution may be entirely undetectable by any CTM based on (Szabadváry, 3 Jan 2026).
4. Simulation Studies Demonstrating Conformal Blindness
Empirical studies corroborate the existence and impact of -cryptic change-points. The canonical setup entails pre-change samples drawn IID from , then a post-change phase from , with two regimes:
- Non-cryptic shift ( off the cryptic line): CTM martingale explodes rapidly, p-value histogram clearly departs from uniformity, and conformal prediction intervals widen dramatically.
- Cryptic shift ( on cryptic line): p-value histogram remains perfectly uniform, martingale remains near 1, and conformal intervals stay optimally tight, despite full statistical distinction between the underlying distributions (Szabadváry, 3 Jan 2026).
This demonstrates that even unbounded deviations in data distribution can be rendered invisible to CTM if they leave the conformity-score law invariant relative to the chosen .
5. Implications for Conformal Testing and Methodological Remediation
The -crypticity phenomenon imposes a strict limitation: CTMs can—by construction—only detect departures from exchangeability that disturb the law of conformity scores as encoded by . Consequently, the coverage guarantees of conformal prediction under exchangeability and the detectability of genuine distributional shifts are coupled. Further, the practical power of CTMs is governed as much by the choice of conformity measure as by the underlying testing protocol.
Plausible implications: Using multiple or ensemble conformity measures may increase robustness to cryptic shifts, as at least one measure might be sensitive to the change. Parallel CTMs or copula-based joint tests on p-value vectors have been suggested to broaden detection (Szabadváry, 3 Jan 2026). Another open problem is the systematic characterization of all possible -cryptic pairs for a given , and investigation of whether adaptive (online, transductive) calibration offers additional protection against conformal blindness.
6. Comparative Summary of Properties and Applications
| Aspect | Under Exchangeability | Under -Cryptic Shift () |
|---|---|---|
| Conformal p-values | IID Uniform(0,1) | Remain IID Uniform(0,1) |
| CTM Martingale | Non-explosive, stable | Remains stable or drifts downward |
| Conformal Intervals | Efficient (small width) | Remain efficient |
| Type of detectable change | Any affecting -score law | None; shift leaves -score law invariant |
Applications of CTMs include online change-point detection, sequential monitoring of model validity, and real-time exchangeability assessment. Their efficacy, however, is strictly constrained by the alignment of the conformity measure with the types of distributional shifts that might occur (Szabadváry, 3 Jan 2026).
7. Open Problems and Directions for Further Research
- Systematic classification of -cryptic change-points for arbitrary conformity measures and data models.
- Development of ensemble, dynamically adaptive, or joint testing frameworks that maintain coverage guarantees while mitigating conformal blindness.
- Theoretical investigation into the limits of detectability for structured adaptive calibrators (transductive variants).
- Quantification of practical trade-offs in prediction-set efficiency versus detection power for different conformity measures.
This body of research establishes both the sharp validity and the precise limitations of CTMs, positioning conformal blindness as a fundamental phenomenon intrinsic to the mechanism of conformal inference (Szabadváry, 3 Jan 2026).