Partition Learning Conformal Prediction

Updated 15 January 2026

Partition Learning Conformal Prediction (PLCP) is a framework that enhances traditional conformal prediction by using data-driven or semantic partitions to localize coverage guarantees.
PLCP adapts thresholds and penalty mechanisms within each partition, balancing local uncertainty and efficiency across heterogeneous data subsets.
PLCP methods extend to federated, multiclass, and model diagnostic scenarios, demonstrating improved prediction set interpretability and robust empirical performance.

Partition Learning Conformal Prediction (PLCP) generalizes and strengthens conformal prediction (CP) methodologies by integrating data-driven or semantically informed partitions of the input or label space. PLCP frameworks enable the construction of prediction sets that maintain rigorous marginal or approximate @@@@1@@@@, exploit heterogeneous uncertainty structures, and improve practical efficiency and interpretability. There exist several distinct PLCP formulations, including partitioned calibrations for conditional guarantees, group-penalized set formation leveraging semantic similarity, federated scenarios on disjoint data silos, and approaches using learned partitions from model diagnostics.

1. Foundations: Conformal Prediction and Partitioning

Conformal prediction provides finite-sample valid prediction sets for regression, classification, and other supervised tasks. Let $(X,Y)$ be drawn from a population distribution, with output $Y$ taking values in a finite set (classification) or $\mathbb R$ (regression). A trained black-box model $f$ provides, for each input $x$ , a conformity score $s(x,y)$ that quantifies how unlikely $y$ is as the label for $x$ . The canonical CP framework computes a threshold $\tau$ from the distribution of conformity scores on a calibration set, yielding the prediction set: $C_\alpha(x) = \{ y : s(x,y) \le \tau \}$ guaranteeing marginal coverage: $\mathbb P \left( Y \in C_\alpha(X) \right) \ge 1 - \alpha$ Partition Learning Conformal Prediction augments this by introducing partitions—either of the feature space $\mathcal X$ or the label space—so that coverage guarantees or efficiency can be localized or improved within each region or group. The likelihood of inhomogeneous uncertainty, class similarities, or structural knowledge motivates this extension (Kiyani et al., 2024, Bai et al., 2022, Fargion et al., 24 Nov 2025, LeRoy et al., 2021, Spjuth et al., 2018).

2. Partitioned Thresholds and Empirical Risk Approaches

PLCP formulations partition the calibration or test domain into $M$ disjoint regions $\{R_1,\ldots,R_M\}$ . Within each $R_m$ , a separate threshold $\tau_m$ for the conformity score is chosen: $C_\tau(x) = \{ y : s(x,y) \le \tau_{m(x)} \}$ where $m(x)$ identifies the region containing $x$ . This enables per-region set calibration, targeting regions with heterogeneous uncertainty.

The empirical risk-minimization approach frames the optimization as: $\min_{\tau \in \mathbb R^M_+} \frac{1}{n}\sum_{i=1}^n \ell_{\text{eff}}\left(C_{\tau}(x_i); y_i\right)$ subject to empirical miscoverage $\leq \alpha$ . This is typically solved via Lagrangian relaxation, with a hinge-surrogate for indicator-based coverage constraints and gradient-based methods for efficiency (Bai et al., 2022). Partition size $M$ balances local adaptivity and statistical error—a key practical consideration.

3. Learning Data-Driven Partitions for Conditional Validity

Achieving nontrivial conditional coverage is impossible in finite samples without further structure. PLCP strategies thus learn a low-complexity partition or soft assignment $h:\mathcal X \to \Delta_m$ from calibration data. For instance, $h^i(x)$ could be the (learned) probability of assigning $x$ to partition $i$ ; typically, $h(x)=\text{softmax}(\phi(x;\theta))$ for a neural network $\phi$ (Kiyani et al., 2024). Each group/partition receives its own quantile threshold $q_i$ .

The joint optimization minimizes a weighted pinball loss for quantile estimation: $L(\theta, q) = \frac{1}{n} \sum_{j=1}^n \sum_{i=1}^m h^i(X_j;\theta)\;\ell_\alpha(q_i,S_j) + \lambda R(\theta)$ Alternating minimization (over $q$ and $\theta$ ) or Lagrangian-based constrained ERM is used. This approach generalizes classical split conformal methods and accommodates powerful flexible models, while still yielding strict marginal coverage and tight mean-square conditional error (MSCE) bounds (Kiyani et al., 2024).

4. Semantic and Learned Label-Partitional Penalization

In multiclass settings, class semantic structure can be leveraged for efficiency and group-coherence of prediction sets. Suppose $C$ labels are grouped into $G$ semantic groups $\mathcal G=\{G_1,\ldots,G_G\}$ , with $g(y)$ denoting group membership. PLCP modifies the conformity score via a partition-based penalty: $s_\lambda(x,y) = s(x,y) + \lambda\, \mathbf{1}\{g(y) \neq g(\hat y(x))\}$ where $\hat y(x)$ is the model's top predicted label. This penalization discourages including out-of-group labels in the prediction set. The threshold $\tau_\lambda$ is computed as usual, and prediction sets become: $C_{\alpha,\lambda}(x) = \{ y : s_\lambda(x,y) \le \tau_\lambda \}$ Such penalty-augmented CP guarantees marginal coverage, provably reduces the number of semantic groups per set, and—under mild conditions—reduces average set size (Fargion et al., 24 Nov 2025). A model-specific soft similarity penalty generalizes this by directly constructing a learned class similarity matrix from model embeddings, obviating hand-crafted groupings and further improving efficiency.

5. Federated and Disjoint Data Partition Aggregation

In privacy-sensitive or federated settings, data are distributed across $K$ locations. Each site builds a local conformal predictor and, for each test instance, returns smoothed conformal p-values for every candidate label. The global aggregator outputs prediction sets using the average (or weighted) p-values from all sites: $\bar p(y|x) = \frac{1}{K} \sum_{k=1}^K p^{(k)}(y|x)$

$\Gamma_{\rm agg}(x) = \{ y : \bar p(y|x) > \epsilon \}$

This maintains the super-uniformity of p-values and thus valid marginal coverage for the overall prediction set. The partition-learning view here refers to the data domain itself, rather than $\mathcal X$ , and underscores PLCP's applicability in distributed learning scenarios (Spjuth et al., 2018).

6. Model Diagnostics and Uncertainty-Driven Partition Formation

Partition learning in conformal inference can also be accomplished using model diagnostics, as in MD-split+. Here, partitions are formed by clustering regions of the feature space with similar local misfit diagnostics—specifically, similarity in Highest Predictive Density (HPD) value empirical distributions. Calibration, density fitting, and quantile regression over HPD scores construct "diagnostic signatures" for partitioning, followed by split-conformal inference within each cluster. This approach is robust to model misspecification and addresses local coverage errors that are undetectable via traditional methods (LeRoy et al., 2021).

7. Theoretical Guarantees and Empirical Performance

PLCP frameworks preserve marginal finite-sample validity under exchangeability assumptions; the only loss relative to global CP is moderate increases in uniform convergence error proportional to the number of partitions or groups. For learned or fixed partitions, finite-sample and infinite-sample MSCE can be tightly bounded—with highly favorable scaling in partition size and calibration set size (Bai et al., 2022, Kiyani et al., 2024). Group-penalized and model-specific similarity variants maintain strict coverage, tightly reduce semantic dispersion, and, in empirical studies, achieve significant reductions in average set size and improved conditional coverage uniformity across varying regression, classification, and covariate shift scenarios (Fargion et al., 24 Nov 2025, Kiyani et al., 2024).

Systematic experiments demonstrate that PLCP outperforms classical CP and other local or conditional-coverage baselines on real-world and synthetic tasks, with improvements in worst-case subgroup coverage, prediction set efficiency, and interpretability across settings with known and unknown class semantics, distribution shift, and federated data (Fargion et al., 24 Nov 2025, Kiyani et al., 2024, LeRoy et al., 2021, Spjuth et al., 2018).

Table: Key PLCP Variants and Targets

PLCP Variant	Partition Domain	Core Mechanism
Region-specific threshold	Input/covariate space	Quantile per region/group
Semantic label penalty	Label semantic groups	Penalization of out-of-group labels
Model-specific similarity	Learned label similarities	Penalty via embedding-based similarity
Federated PLCP	Data source partitions	P-value aggregation across sites
Diagnostic partitions	Model misfit subsystems	Partition via local diagnostic curves

References

(Fargion et al., 24 Nov 2025): Enhancing Conformal Prediction via Class Similarity
(Kiyani et al., 2024): Conformal Prediction with Learned Features
(Bai et al., 2022): Efficient and Differentiable Conformal Prediction with General Function Classes
(Spjuth et al., 2018): Aggregating Predictions on Multiple Non-disclosed Datasets using Conformal Prediction
(LeRoy et al., 2021): MD-split+: Practical Local Conformal Inference in High Dimensions