One Sample is Enough to Make Conformal Prediction Robust

Published 19 Jun 2025 in cs.LG and cs.AI | (2506.16553v1)

Abstract: Given any model, conformal prediction (CP) returns prediction sets guaranteed to include the true label with high adjustable probability. Robust CP (RCP) extends this to inputs with worst-case noise. A well-established approach is to use randomized smoothing for RCP since it is applicable to any black-box model and provides smaller sets compared to deterministic methods. However, current smoothing-based RCP requires many model forward passes per each input which is computationally expensive. We show that conformal prediction attains some robustness even with a forward pass on a single randomly perturbed input. Using any binary certificate we propose a single sample robust CP (RCP1). Our approach returns robust sets with smaller average set size compared to SOTA methods which use many (e.g. around 100) passes per input. Our key insight is to certify the conformal prediction procedure itself rather than individual scores. Our approach is agnostic to the setup (classification and regression). We further extend our approach to smoothing-based robust conformal risk control.

Abstract PDF Upgrade to Chat

Summary

The paper presents RCP1, achieving robust conformal prediction by certifying the complete procedure with just one perturbed input sample per test instance.
It establishes closed-form, conservative coverage guarantees under adversarial perturbations, eliminating the need for costly Monte Carlo sampling.
Empirical evaluations show that RCP1 significantly reduces computation while matching or surpassing the efficiency and accuracy of multi-sample methods.

One Sample is Enough to Make Conformal Prediction Robust

The paper introduces RCP1, a methodology that renders conformal prediction robust to adversarial input perturbations using only a single randomized input sample per test instance. The dominant paradigm in robust conformal prediction (RCP) has relied on randomized smoothing, which improves robustness by repeated averaging over multiple noisy copies of the input, but at significant computational cost. The central claim is that, by directly certifying the conformal prediction procedure—as opposed to individual scores or outputs—substantial robustness can be achieved with a single noise-augmented forward pass, maintaining competitive set efficiency and coverage guarantees with drastically less computation.

Theoretical Contributions

The main theoretical advancement is the observation that conformal prediction under randomized smoothing inherently enjoys a degree of robustness, even when the smoothed score is calculated from only one noise-perturbed input. The authors formalize this via a conservative guarantee: for any user-selected risk level $\alpha$ , and for any black-box predictive model, the robust coverage under adversarial $\ell_2$ -bounded perturbation of radius $r$ can be lower-bounded by a quantity $c[1 - \alpha, r]$ . This certificate $c[\cdot, r]$ is monotonic and convex in $\beta$ and admits closed-form solutions for Gaussian smoothing—specifically, $c[\beta, r] = \Phi_\sigma(\Phi_\sigma^{-1}(\beta) - r)$ —and is efficiently solvable for arbitrary smoothing schemes and threat models.

Importantly, the robust coverage guarantee provided by RCP1 can be enforced by adjusting the calibration quantile used in threshold selection, so that after certification, the coverage remains at least $1 - \alpha$ for perturbed data. This is achievable by increasing the nominal coverage parameter in standard CP to $1 - \alpha'$ , where $c[1 - \alpha', r] \geq 1 - \alpha$ .

A critical distinction with prior methods is that RCP1 requires no Monte Carlo sampling to estimate smoothed statistics (means, quantiles, etc.) at test time, in contrast to BinCP or RSCP, which typically require 70–256 model evaluations per input to maintain tight set size and accurate coverage. In environments where model inference is expensive, or sample budgets are constrained, this provides a practical advantage.

Implementation Details

RCP1 is black-box, model- and distribution-agnostic, and compatible with any exchangeable calibration-design. At test time, each input is perturbed with a single draw from the smoothing distribution (e.g., Gaussian additive noise). During calibration, the same procedure is applied to the calibration set. The conformal predictor thresholds are adjusted via the certified lower bound to ensure the robust coverage. The approach is compatible with a variety of smoothing schemes beyond isotropic Gaussian—including Laplace and uniform noise, and general $\ell_2$ 0-balls—via the generalized certification approach presented.

Implementation is straightforward:

$\ell_2$ 6

The critical calibration step is to select the threshold $\ell_2$ 1 not as the $\ell_2$ 2 quantile of the smoothed calibration scores, but as the quantile corresponding to the coverage parameter $\ell_2$ 3, with $\ell_2$ 4 chosen so that $\ell_2$ 5.

Empirical Results

Empirical evaluation is performed on CIFAR-10 and ImageNet using both conventional (ResNet) and compute-intensive (Diffusion + ViT) pipelines. Results highlight:

Efficiency: RCP1 closely matches or surpasses the set efficiency (average set size) of state-of-the-art robust conformal predictors at sample rates where the competitors are feasible, while reducing computational cost by 1-2 orders of magnitude.
Coverage: RCP1’s empirical robust coverage consistently satisfies or slightly exceeds the user-specified target, with theoretical guarantees tightly matching observed coverage in all experiments.
Practicality: RCP1 enables use of much larger and more accurate downstream models (e.g., ViT, BEiT-L), as inference is now dominated by a single model call per input.

Notably, RCP1 achieves robust prediction sets of similar size to BinCP (with 64 or more samples) using only one sample per input; this unlocks deployment in real-time, resource-constrained, or high-throughput environments.

Limitations and Trade-offs

The robust coverage guarantee from RCP1 is slightly conservative, due to being based on a worst-case binary certificate over the coverage probability distribution, not exploiting problem-specific distributional information.
For large sample rates or unconstrained compute environments, BinCP or RSCP can produce slightly smaller sets through tighter estimation of the smoothed statistic. RCP1 is not intended to outperform these methods asymptotically, but rather to provide a compute-efficient alternative.
The adversarial guarantee is valid on average over the distribution of (randomized) conformal sets; adversarial manipulation can, in principle, break coverage for some fixed noise draws.

Extensions and Broader Implications

The methodology generalizes immediately to robust conformal regression and robust risk-controlling prediction, using the same certificate machinery for interval and mask predictions. The recipe for building certificates, combined with affordable single-pass inference, opens a path for robust uncertainty quantification in domains where sample-intensive classical smoothing methods are infeasible.

From a theoretical viewpoint, the paper underscores the role of the conformal prediction procedure itself as a robustification mechanism under randomized input transformations, beyond what is captured by examining mean or quantile smoothed statistics alone. This shifts the focus from tailored score-level certification to global, coverage-based guarantees in the design of robust predictors.

Future Directions

Extension to structured outputs and settings with weak or non-exchangeable calibration sets.
Developing adaptive smoothing schemes to optimize the accuracy-robustness trade-off per input instance.
Empirical study of RCP1 in large-scale LLMs, time-series, or datasets with significant distribution shift.
Exploration of stronger instance-wise guarantees, perhaps via hybrid methodology (low-rate smoothing plus CP certification).

Overall, RCP1 provides a paradigm shift for practical robust uncertainty quantification, rendering smoothing-based robust conformal prediction accessible at scale, with broad applicability to real-world robust AI deployments.

Markdown Report Issue