Split Conformal Prediction

Updated 1 February 2026

Split Conformal Prediction is a distribution-free framework that splits data into training and calibration sets to provide finite-sample validity for predictive intervals.
It leverages empirical quantiles from calibration scores to form prediction sets with rigorous marginal coverage guarantees under exchangeable data.
Extensions and adaptations address challenges like heteroskedasticity, contamination, and high-dimensional outputs while maintaining computational efficiency.

Split Conformal Prediction (CP) is a computationally efficient, distribution-free framework for constructing predictive intervals or sets with finite-sample validity. Given a user-specified miscoverage level $\alpha \in (0,1)$ and exchangeable data, split CP yields rigorous marginal coverage guarantees with minimal structural or distributional assumptions. Unlike full conformal prediction, which requires repeated retraining or recomputation for each candidate output, split CP relies on a one-time division of the data into a proper training set and a calibration set, enabling broad applicability with low computational overhead.

1. Formal Definition and Statistical Guarantees

Given data pairs $(X_i, Y_i)$ , split CP operates by partitioning the available data into a training set (used to fit a “base” or “black-box” predictor) and a calibration set of size $n$ . For a nonconformity (score) function $\rho(x,y)$ —typically residual or predictive discrepancy—the calibration scores $S_i = \rho(X_i, Y_i)$ , $i=1,\dots,n$ , are computed. For new covariate $x$ , the split conformal prediction set at level $\alpha$ is

$C_n^{(\alpha)}(x) = \{ y \in \mathcal{Y} : \rho(x, y) \le S_{(b)} \}$

where $S_{(b)}$ is the $b$ -th smallest calibration score, $b = \lceil (1-\alpha)(n+1) \rceil$ . Under exchangeability and regularity (no ties), for a future $(X_{n+1}, Y_{n+1})$ the coverage guarantee is

$\mathbb{P}\left(Y_{n+1} \in C_n^{(\alpha)}(X_{n+1})\right) \geq 1-\alpha.$

This holds for arbitrary (possibly misspecified) predictors and is entirely distribution-free (Hulsman, 2022).

The full finite-sample distribution of empirical coverage for a batch of $m$ future draws is exactly characterized: the number of covered points in the batch, $m C_m^{(n,\alpha)}$ , is $\mathrm{Beta}$ - $\mathrm{Binomial}(b,g)$ with $g = \lfloor \alpha(n+1) \rfloor$ . As $m \to \infty$ , the almost sure limit of the empirical coverage is $\mathrm{Beta}(b, g)$ (F, 2023).

2. Exact Coverage Distributions and Calibration Size Selection

The universality of split CP coverage arises from the exchangeability of calibration and test scores. Specifically, for batch size $m$ : $m C_m^{(n,\alpha)} \sim \mathrm{Beta}\text{-}\mathrm{Binomial}(b, g),$ and, for infinite batches,

$C_\infty^{(n,\alpha)} \sim \mathrm{Beta}(b, g),$

where $b = \lceil (1-\alpha)(n+1) \rceil$ , $g = \lfloor \alpha (n+1) \rfloor$ (F, 2023).

This closed-form law enables principled calibration sample size selection. If one wishes the limiting empirical coverage to lie within $\epsilon$ of $1-\alpha$ with probability at least $\tau$ , it suffices to choose the smallest $n$ such that

$I_{1-\alpha+\epsilon}(b, g) - I_{1-\alpha-\epsilon}(b, g) \geq \tau,$

where $I_x(b,g)$ is the regularized incomplete Beta function. Precomputed tables for common settings are provided in (F, 2023).

3. Algorithmic Workflow

The generic split CP workflow consists of:

Fitting any base model on a training subset.
Computing nonconformity or residual scores $r_i = \rho(X_i, Y_i)$ for each calibration point.
Determining the empirical quantile threshold $\widehat{Q} = r_{(\lceil(1-\alpha)(n+1)\rceil)}$ .
For any new $x$ , forming the interval or set

$\widehat{C}(x) = \{y : \rho(x, y) \leq \widehat{Q}\}.$

This yields a finite-sample marginal coverage guarantee, and—under certain conditions—yields coverage that is only minimally conservative (gap at most $1/(n+1)$) (Hulsman, 2022, F, 2023).

4. Extensions, Variants, and Limitations

Split CP is broadly extensible:

Tolerance regions: A duality exists between marginal coverage and tolerance regions, with the distribution of conditional coverage controlled via Binomial and Beta relations. Stronger tolerance coverage at level $(\epsilon, \delta)$ is achieved by adjusting the quantile selection scheme (Hulsman, 2022).
Full vs. split conformal: In full conformal prediction, adaptation to local heteroskedasticity is possible but computationally infeasible for complex models. Split CP—via sample splitting—trades off some statistical efficiency and local adaptivity for tractability (Tailor et al., 27 Jul 2025).
Localized and smoothed split CP: Modifications such as split-localized conformal prediction (SLCP) subtract a local conditional quantile from each score to achieve approximate conditional coverage, while smoothing-based SCD-split merges disconnected conformal intervals for interpretability without sacrificing marginal coverage (Han et al., 2022, Zheng et al., 26 Sep 2025).
Predictive distributions: Split conformal predictive systems (SCPS) generalize set-valued prediction to distributional calibration, leveraging either “randomized” or “crisp” splits. Distributional calibration is guaranteed under exchangeability (Vovk et al., 2019, Vovk et al., 2019).

Limitations include:

Statistical inefficiency due to unused data in either model fitting or calibration, inflating interval widths when calibration size is small.
Poor adaptation to heteroskedastic or input-dependent uncertainty, as a data-independent quantile is used for all new predictions (Tailor et al., 27 Jul 2025).
Inability to guarantee strong forms of conditional or subgroup coverage without further localization or smoothing (Hulsman, 2022, Han et al., 2022).

5. Universal Validity and Robustness

The coverage guarantees of split CP are entirely universal under exchangeability. However, in non-exchangeable (dependent or contaminated) settings:

Non-exchangeable data: Under weak dependence (e.g., $\beta$ -mixing), a small explicit penalty term $\delta$ can be computed, and the adjusted scheme achieves coverage $1 - \alpha - \delta$ (Oliveira et al., 2022).
Markovian dependence: When applied to Markov data, split CP's coverage gap is $O(\sqrt{t_\mathrm{mix}\ln n / n})$ , with $t_\mathrm{mix}$ the mixing time. Thinning the calibration set (K-split CP) can improve the gap to $O(t_\mathrm{mix}/(n\ln n))$ (Zheng et al., 2024).
Data contamination: Under Huber- $\varepsilon$ contamination, the coverage deviates from the nominal level by at most $\epsilon d_{KS}$ , where $d_{KS}$ is the Kolmogorov–Smirnov distance between the clean and contaminant calibration score distributions. In classification, explicit CR-CP schemes estimate transition effects and maintain coverage guarantees under label noise (Clarkson et al., 2024).

These properties enable distribution-free finite-sample guarantees even in weakly dependent or partially contaminated regimes, provided the penalty or adjustment is accounted for.

6. Practical Implementation and Applications

Split CP is model-agnostic; it can be applied atop any black-box predictor (regression, classification, structured output, functional data, or even outputs from neural operators or LLMs) as long as a suitable score function can be defined (Tailor et al., 27 Jul 2025, Xu, 30 Aug 2025, Millard et al., 4 Sep 2025). For high-dimensional, functional, or non-Euclidean output spaces, relevant split CP variants have been developed (F. et al., 2024, Millard et al., 4 Sep 2025).

Efficient implementations are available for both batch-mode and streaming, for point-valued intervals, set-valued predictions, and distributional outputs. Extensions to multiple random splits (multi-split CP) can reduce result variability at the cost of some conservativeness (Solari et al., 2021). Recent developments include unsupervised calibration methodologies for the scenario when calibration labels are unavailable (Mazuelas, 8 Oct 2025).

7. Summary Table: Key Statistical Objects and Laws

Coverage Concept	Law/distribution	Parameters
Empirical batch coverage	Beta–Binomial	$b = \lceil(1-\alpha)(n+1)\rceil$ , $g = \lfloor\alpha(n+1)\rfloor$ , $m$
Infinite-batch limit	Beta	same $b,g$
Marginal finite-sample	Coverage $\ge 1-\alpha$ , $\le 1-\alpha+\frac{1}{n+1}$	$n$
Tolerance coverage	Determined via Binomial inversion	$(\epsilon, \delta)$ , $n$

Empirical coverage laws are fully determined by the nominal miscoverage $\alpha$ and calibration size $n$ , regardless of data distribution or model choice (exchangeability required) (F, 2023, Hulsman, 2022).

Split conformal prediction is a universally valid, computationally straightforward mechanism for set-valued or distributional uncertainty quantification in modern statistical learning, offering precise finite-sample control under minimal assumptions. Its universality, robustness to weak dependence and contamination, and extensibility to complex outputs and calibration schemes form the foundation for its rapidly expanding application in high-stakes predictive inference (F, 2023, Hulsman, 2022, Tailor et al., 27 Jul 2025, Zheng et al., 26 Sep 2025, Zheng et al., 2024, Clarkson et al., 2024).