Papers
Topics
Authors
Recent
Search
2000 character limit reached

Partition Learning Conformal Prediction

Updated 15 January 2026
  • Partition Learning Conformal Prediction (PLCP) is a framework that enhances traditional conformal prediction by using data-driven or semantic partitions to localize coverage guarantees.
  • PLCP adapts thresholds and penalty mechanisms within each partition, balancing local uncertainty and efficiency across heterogeneous data subsets.
  • PLCP methods extend to federated, multiclass, and model diagnostic scenarios, demonstrating improved prediction set interpretability and robust empirical performance.

Partition Learning Conformal Prediction (PLCP) generalizes and strengthens conformal prediction (CP) methodologies by integrating data-driven or semantically informed partitions of the input or label space. PLCP frameworks enable the construction of prediction sets that maintain rigorous marginal or approximate @@@@1@@@@, exploit heterogeneous uncertainty structures, and improve practical efficiency and interpretability. There exist several distinct PLCP formulations, including partitioned calibrations for conditional guarantees, group-penalized set formation leveraging semantic similarity, federated scenarios on disjoint data silos, and approaches using learned partitions from model diagnostics.

1. Foundations: Conformal Prediction and Partitioning

Conformal prediction provides finite-sample valid prediction sets for regression, classification, and other supervised tasks. Let (X,Y)(X,Y) be drawn from a population distribution, with output YY taking values in a finite set (classification) or R\mathbb R (regression). A trained black-box model ff provides, for each input xx, a conformity score s(x,y)s(x,y) that quantifies how unlikely yy is as the label for xx. The canonical CP framework computes a threshold τ\tau from the distribution of conformity scores on a calibration set, yielding the prediction set: Cα(x)={y:s(x,y)τ}C_\alpha(x) = \{ y : s(x,y) \le \tau \} guaranteeing marginal coverage: P(YCα(X))1α\mathbb P \left( Y \in C_\alpha(X) \right) \ge 1 - \alpha Partition Learning Conformal Prediction augments this by introducing partitions—either of the feature space X\mathcal X or the label space—so that coverage guarantees or efficiency can be localized or improved within each region or group. The likelihood of inhomogeneous uncertainty, class similarities, or structural knowledge motivates this extension (Kiyani et al., 2024, Bai et al., 2022, Fargion et al., 24 Nov 2025, LeRoy et al., 2021, Spjuth et al., 2018).

2. Partitioned Thresholds and Empirical Risk Approaches

PLCP formulations partition the calibration or test domain into MM disjoint regions {R1,,RM}\{R_1,\ldots,R_M\}. Within each RmR_m, a separate threshold τm\tau_m for the conformity score is chosen: Cτ(x)={y:s(x,y)τm(x)}C_\tau(x) = \{ y : s(x,y) \le \tau_{m(x)} \} where m(x)m(x) identifies the region containing xx. This enables per-region set calibration, targeting regions with heterogeneous uncertainty.

The empirical risk-minimization approach frames the optimization as: minτR+M1ni=1neff(Cτ(xi);yi)\min_{\tau \in \mathbb R^M_+} \frac{1}{n}\sum_{i=1}^n \ell_{\text{eff}}\left(C_{\tau}(x_i); y_i\right) subject to empirical miscoverage α\leq \alpha. This is typically solved via Lagrangian relaxation, with a hinge-surrogate for indicator-based coverage constraints and gradient-based methods for efficiency (Bai et al., 2022). Partition size MM balances local adaptivity and statistical error—a key practical consideration.

3. Learning Data-Driven Partitions for Conditional Validity

Achieving nontrivial conditional coverage is impossible in finite samples without further structure. PLCP strategies thus learn a low-complexity partition or soft assignment h:XΔmh:\mathcal X \to \Delta_m from calibration data. For instance, hi(x)h^i(x) could be the (learned) probability of assigning xx to partition ii; typically, h(x)=softmax(ϕ(x;θ))h(x)=\text{softmax}(\phi(x;\theta)) for a neural network ϕ\phi (Kiyani et al., 2024). Each group/partition receives its own quantile threshold qiq_i.

The joint optimization minimizes a weighted pinball loss for quantile estimation: L(θ,q)=1nj=1ni=1mhi(Xj;θ)  α(qi,Sj)+λR(θ)L(\theta, q) = \frac{1}{n} \sum_{j=1}^n \sum_{i=1}^m h^i(X_j;\theta)\;\ell_\alpha(q_i,S_j) + \lambda R(\theta) Alternating minimization (over qq and θ\theta) or Lagrangian-based constrained ERM is used. This approach generalizes classical split conformal methods and accommodates powerful flexible models, while still yielding strict marginal coverage and tight mean-square conditional error (MSCE) bounds (Kiyani et al., 2024).

4. Semantic and Learned Label-Partitional Penalization

In multiclass settings, class semantic structure can be leveraged for efficiency and group-coherence of prediction sets. Suppose CC labels are grouped into GG semantic groups G={G1,,GG}\mathcal G=\{G_1,\ldots,G_G\}, with g(y)g(y) denoting group membership. PLCP modifies the conformity score via a partition-based penalty: sλ(x,y)=s(x,y)+λ1{g(y)g(y^(x))}s_\lambda(x,y) = s(x,y) + \lambda\, \mathbf{1}\{g(y) \neq g(\hat y(x))\} where y^(x)\hat y(x) is the model's top predicted label. This penalization discourages including out-of-group labels in the prediction set. The threshold τλ\tau_\lambda is computed as usual, and prediction sets become: Cα,λ(x)={y:sλ(x,y)τλ}C_{\alpha,\lambda}(x) = \{ y : s_\lambda(x,y) \le \tau_\lambda \} Such penalty-augmented CP guarantees marginal coverage, provably reduces the number of semantic groups per set, and—under mild conditions—reduces average set size (Fargion et al., 24 Nov 2025). A model-specific soft similarity penalty generalizes this by directly constructing a learned class similarity matrix from model embeddings, obviating hand-crafted groupings and further improving efficiency.

5. Federated and Disjoint Data Partition Aggregation

In privacy-sensitive or federated settings, data are distributed across KK locations. Each site builds a local conformal predictor and, for each test instance, returns smoothed conformal p-values for every candidate label. The global aggregator outputs prediction sets using the average (or weighted) p-values from all sites: pˉ(yx)=1Kk=1Kp(k)(yx)\bar p(y|x) = \frac{1}{K} \sum_{k=1}^K p^{(k)}(y|x)

Γagg(x)={y:pˉ(yx)>ϵ}\Gamma_{\rm agg}(x) = \{ y : \bar p(y|x) > \epsilon \}

This maintains the super-uniformity of p-values and thus valid marginal coverage for the overall prediction set. The partition-learning view here refers to the data domain itself, rather than X\mathcal X, and underscores PLCP's applicability in distributed learning scenarios (Spjuth et al., 2018).

6. Model Diagnostics and Uncertainty-Driven Partition Formation

Partition learning in conformal inference can also be accomplished using model diagnostics, as in MD-split+. Here, partitions are formed by clustering regions of the feature space with similar local misfit diagnostics—specifically, similarity in Highest Predictive Density (HPD) value empirical distributions. Calibration, density fitting, and quantile regression over HPD scores construct "diagnostic signatures" for partitioning, followed by split-conformal inference within each cluster. This approach is robust to model misspecification and addresses local coverage errors that are undetectable via traditional methods (LeRoy et al., 2021).

7. Theoretical Guarantees and Empirical Performance

PLCP frameworks preserve marginal finite-sample validity under exchangeability assumptions; the only loss relative to global CP is moderate increases in uniform convergence error proportional to the number of partitions or groups. For learned or fixed partitions, finite-sample and infinite-sample MSCE can be tightly bounded—with highly favorable scaling in partition size and calibration set size (Bai et al., 2022, Kiyani et al., 2024). Group-penalized and model-specific similarity variants maintain strict coverage, tightly reduce semantic dispersion, and, in empirical studies, achieve significant reductions in average set size and improved conditional coverage uniformity across varying regression, classification, and covariate shift scenarios (Fargion et al., 24 Nov 2025, Kiyani et al., 2024).

Systematic experiments demonstrate that PLCP outperforms classical CP and other local or conditional-coverage baselines on real-world and synthetic tasks, with improvements in worst-case subgroup coverage, prediction set efficiency, and interpretability across settings with known and unknown class semantics, distribution shift, and federated data (Fargion et al., 24 Nov 2025, Kiyani et al., 2024, LeRoy et al., 2021, Spjuth et al., 2018).


Table: Key PLCP Variants and Targets

PLCP Variant Partition Domain Core Mechanism
Region-specific threshold Input/covariate space Quantile per region/group
Semantic label penalty Label semantic groups Penalization of out-of-group labels
Model-specific similarity Learned label similarities Penalty via embedding-based similarity
Federated PLCP Data source partitions P-value aggregation across sites
Diagnostic partitions Model misfit subsystems Partition via local diagnostic curves

References

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Partition Learning Conformal Prediction (PLCP).