Non-monotonicity in Conformal Risk Control

Published 2 Apr 2026 in stat.ML and cs.LG | (2604.01502v1)

Abstract: Conformal risk control (CRC) provides distribution-free guarantees for controlling the expected loss at a user-specified level. Existing theory typically assumes that the loss decreases monotonically with a tuning parameter that governs the size of the prediction set. This assumption is often violated in practice, where losses may behave non-monotonically due to competing objectives such as coverage and efficiency. We study CRC under non-monotone loss functions when the tuning parameter is selected from a finite grid, a common scenario in thresholding or discretized decision rules. Revisiting a known counterexample, we show that the validity of CRC without monotonicity depends on the relationship between the calibration sample size and the grid resolution. In particular, risk control can still be achieved when the calibration sample is sufficiently large relative to the grid. We provide a finite-sample guarantee for bounded losses over a grid of size $m$, showing that the excess risk above the target level $α$ is of order $\sqrt{\log(m)/n}$, where $n$ is the calibration sample size. A matching lower bound shows that this rate is minimax optimal. We also derive refined guarantees under additional structural conditions, including Lipschitz continuity and monotonicity, and extend the analysis to settings with distribution shift via importance weighting. Numerical experiments on synthetic multilabel classification and real object detection data illustrate the practical impact of non-monotonicity. Methods that account for finite-sample deviations achieve more stable risk control than approaches based on monotonicity transformations, while maintaining competitive prediction-set sizes.

Abstract PDF Upgrade to Chat

Authors (3)

Summary

The paper introduces explicit finite-sample, minimax-optimal excess risk bounds for non-monotonic losses in conformal risk control.
The paper demonstrates that risk control deteriorates with increased grid resolution unless the calibration sample size scales appropriately.
The paper empirically compares CRC-NM with alternative methods, highlighting improved stability and efficiency in both synthetic and real-world settings.

Non-monotonicity in Conformal Risk Control

Introduction

This work rigorously analyzes Conformal Risk Control (CRC) procedures under violation of the loss monotonicity assumption, establishing distribution-free finite-sample guarantees for non-monotonic loss functions within conformal inference. CRC, an extension of conformal prediction, aims to select set-valued predictors controlling $\mathbb{E}\left[\ell(X_{n+1}, Y_{n+1}; \hat{\lambda})\right] \le \alpha$ for user-specified loss $\ell$ and target level $\alpha$ , by tuning over a parameter grid $\Lambda$ . While classical CRC theory requires $\ell(\cdot, \cdot; \lambda)$ to be non-increasing in $\lambda$ , practical losses—especially those balancing competing objectives such as coverage vs. efficiency in object detection or fairness metrics—often exhibit non-monotonicity. The primary contribution is explicit finite-sample, minimax-optimal excess risk bounds for such settings, leveraging concentration phenomena and empirical process analysis.

Failure Modes of CRC under Non-Monotone Losses

The paper first sharpens known counterexamples where CRC fails to guarantee risk control without monotonicity (2604.01502). The analysis demonstrates that when selecting $\hat{\lambda}$ from a discrete grid of size $m$ using $n$ samples, the excess risk above $\alpha$ scales as $\ell$ 0, and risk control can fail if calibration sample size $\ell$ 1 does not grow with grid resolution $\ell$ 2. Specifically, with $\ell$ 3, probability mass accumulates on riskier selections, and learned selection thresholds no longer guarantee the nominal target.

Main Theoretical Results: Minimax-Optimal Risk Bounds

The major technical result establishes a finite-sample upper bound for bounded, non-monotonic losses over a grid, showing that the excess risk incurred by data-dependent parameter selection is at most $\ell$ 4 for universal $\ell$ 5. This is achieved via uniform concentration inequalities (Hoeffding/Bernstein and their empirical variants, see appendix) and a union bound over candidate thresholds:

Theorem (informal): If $\ell$ 6 is bounded and $\ell$ 7 is a grid of $\ell$ 8 candidate values, then for the conformal selector $\ell$ 9, $\alpha$ 0

A matching minimax lower bound is proven, asserting that no selection procedure— irrespective of adaptivity or prior knowledge—can uniformly achieve smaller excess for all bounded, non-monotonic losses:

Proposition: There exists a distribution such that $\alpha$ 1

This scaling is fundamental: increasing grid granularity (larger $\alpha$ 2) enhances discretization but exacerbates the statistical risk, and both phenomena must be balanced.

Exploiting Structure: Monotonicity, Lipschitz Losses, and Improved Rates

The authors elucidate a hierarchy of guarantees based on structural assumptions. For monotone losses (classical CRC-type settings), exact risk control at level $\alpha$ 3 is retained, eliminating the statistical correction term. For globally Lipschitz losses under a margin condition, the excess risk decays exponentially in $\alpha$ 4, controlled by the probability that selection using $\alpha$ 5 or $\alpha$ 6 samples might disagree. This stability-based perspective connects with broader algorithmic stability literature.

Empirical Comparisons: CRC-NM vs. Monotonization and Bootstrapped Stability

The work benchmarks CRC-NM (“non-monotonic” method, i.e., unaltered empirical risk with finite-sample correction $\alpha$ 7) against various alternatives, including empirical monotonicity transforms and stability-based bootstrapping approaches (2604.01502).

Empirical risk selection is compared to two established “monotonization” strategies:

Loss-monotonization replaces each loss by the worst-case value over $\alpha$ 8—enforcing monotonicity per sample but usually resulting in highly conservative thresholds.
Risk-monotonization enforces monotonicity on the aggregate empirical risk curve, selecting the minimal feasible threshold, but delivers only asymptotic guarantees. Another approach, CRC-C, uses bootstrap-estimated selection-level corrections (stability-based).

In non-monotone synthetic multilabel settings, CRC-NM achieves precise risk control, while monotonization methods overinflate set sizes and risk; CRC-NM adjusts only by the minimax correction $\alpha$ 9. In near-monotonic real-data experiments (ImageNet classification, COCO object detection), CRC-NM provides more stable and less conservative solutions than loss-monotonization and (when empirical variance is moderate/low) more reliable risk control than CRC-C.

Figure 1: Empirical risk curves and selected thresholds under non-monotonic loss, contrasting achievable risk for CRC-NM, loss-monotonization, and risk-monotonization.

Figure 2: Distribution of empirical risks for ResNet-18 over ImageNet calibration-test splits showing that CRC-NM and CRC-C both control risk, with CRC-NM incurring a slightly larger explicit correction.

Figure 3: Synthetic multilabel classification experiment showing CRC-NM maintains tight risk control under oscillatory, non-monotonic loss landscapes, compared to other methods.

Figure 4: COCO object detection experiment: (left) test risk distributions showing CRC-NM's conservative but reliable control; (right) corresponding prediction-set size distributions highlighting efficiency.

Extensions: Distributional Shift and Importance Weighting

The framework is generalized to non-i.i.d. test settings (covariate and more general distribution shift), by importance weighting the empirical risk for candidate parameters. If the test likelihood ratio $\Lambda$ 0 is bounded, the excess risk incurs a linear penalty in its upper bound $\Lambda$ 1, but the $\Lambda$ 2 rate is preserved.

Practical and Theoretical Implications

Practical Control: For bounded non-monotonic losses, risk control at target level $\Lambda$ 3 is feasible and theoretically valid even without monotonicity, given sufficient calibration data relative to grid size.
Model Selection Perspective: Parameter tuning in non-monotone CRC reframes as a statistical model selection problem; the derived minimax rate is fundamental, not specific to any single selection rule.
Methodological Guidance: Loss or risk monotonization is structurally conservative or only asymptotically valid; direct finite-sample corrections as in CRC-NM yield adaptive and reliable calibration with interpretably quantified excess risk.
Scalability: Since excess risk increases only logarithmically with grid size, substantial flexibility exists for grid design, especially as sample sizes increase.
Extension to Distribution Shift: The method is robust to moderate covariate shift, leveraging importance-weighted CRC.

Conclusion

This work establishes that conformal risk control can ensure rigorous expectation-level risk bounds when the loss is non-monotonic and the parameter grid is finite. The minimax lower bounds and explicit excess risk corrections quantify both the opportunity and limitation of non-monotonic CRC compared to monotonic settings. These findings have direct implications for deployment of risk-controlling sets in vision, medicine, and any high-valued domains employing discrete or thresholded policies, paving the way for further research into continuous parameter settings and heavy-tailed losses (2604.01502).

Markdown Report Issue