Papers
Topics
Authors
Recent
Search
2000 character limit reached

Probabilistic classification from possibilistic data: computing Kullback-Leibler projection with a possibility distribution

Published 2 Apr 2026 in cs.AI and cs.LG | (2604.01939v1)

Abstract: We consider learning with possibilistic supervision for multi-class classification. For each training instance, the supervision is a normalized possibility distribution that expresses graded plausibility over the classes. From this possibility distribution, we construct a non-empty closed convex set of admissible probability distributions by combining two requirements: probabilistic compatibility with the possibility and necessity measures induced by the possibility distribution, and linear shape constraints that must be satisfied to preserve the qualitative structure of the possibility distribution. Thus, classes with the same possibility degree receive equal probabilities, and if a class has a strictly larger possibility degree than another class, then it receives a strictly larger probability. Given a strictly positive probability vector output by a model for an instance, we compute its Kullback-Leibler projection onto the admissible set. This projection yields the closest admissible probability distribution in Kullback-Leibler sense. We can then train the model by minimizing the divergence between the prediction and its projection, which quantifies the smallest adjustment needed to satisfy the induced dominance and shape constraints. The projection is computed with Dykstra's algorithm using Bregman projections associated with the negative entropy, and we provide explicit formulas for the projections onto each constraint set. Experiments conducted on synthetic data and on a real-world natural language inference task, based on the ChaosNLI dataset, show that the proposed projection algorithm is efficient enough for practical use, and that the resulting projection-based learning objective can improve predictive performance.

Authors (2)

Summary

  • The paper introduces a KL-projection approach that computes projections from model outputs onto a convex set defined by possibilistic constraints.
  • It derives explicit analytical expressions for both dominance and order-preserving constraints using Dykstra’s algorithm with Bregman projections for computational efficiency.
  • Empirical evaluations on synthetic and NLI tasks show improved performance under ambiguous, imprecise annotations compared to conventional methods.

Probabilistic Classification from Possibilistic Data via KL Projection: A Technical Synthesis

Motivation and Problem Setting

This paper addresses multi-class probabilistic classification under possibilistic supervision, where each data instance is annotated not with a categorical class label or probability vector, but with a normalized possibility distribution π\pi over possible classes. Possibility theory, distinct from probability theory, encapsulates epistemic uncertainty (uncertainty arising from lack of knowledge) and induces dual possibility (Π\Pi) and necessity (NN) measures. Practical situations motivating this paradigm include imprecise, incomplete, or crowd-aggregated annotations, where representing graded plausibility of classes is more meaningful than enforcing precise probability distributions.

The core technical challenge is twofold: (i) define a “feasible” convex set of probability distributions consistent with the provided possibility distribution, and (ii) train probabilistic classifiers (e.g., softmax neural networks) such that their output is compatible, in a precise sense, with this possibilistic supervision.

Constructing the Set of Feasible Probability Distributions

Given a normalized possibility distribution π\pi over a finite support YY, the authors specify a closed convex set Fbox(π)Δn\mathcal{F}^{\mathrm{box}}(\pi) \subset \Delta_n (the probability simplex), comprising probability distributions pp that must satisfy two classes of constraints:

  1. Dominance constraints (compatibility w.r.t. Π\Pi and NN): For every subset AYA \subset Y, the induced probabilities satisfy Π\Pi0, where Π\Pi1.
  2. Order-preserving (shape) constraints: The ordering and equivalence of possibility levels in Π\Pi2 are preserved in Π\Pi3. That is, if Π\Pi4, then Π\Pi5; if Π\Pi6, then Π\Pi7.

These constraints are shown to be representable as a system of linear inequalities and difference bounds, exploiting structural properties of possibility/necessity measures, and can be constructed efficiently using the “antipignistic” transformation from possibility to probability distributions [dubois1983unfair, dubois1993possibility]. Importantly, the resulting feasible set is always nonempty (contains the antipignistic probability), convex, and closed. The intersection-of-convex-sets representation allows for modular extension with additional constraints, including varying types of qualitative and quantitative shape restrictions.

Kullback-Leibler Projection via Dykstra’s Algorithm

Model outputs (softmax vectors) are generally unconstrained with respect to Π\Pi8. The authors address this via KL projection: given a prediction Π\Pi9, compute the closest NN0 under KL divergence,

NN1

The paper demonstrates that the KL divergence is a Bregman distance (induced by negative entropy), and that the KL projection onto NN2 can be performed using Dykstra’s iterative algorithm with explicit Bregman projections for each constraint set. Analytical expressions for the projections onto the dominance and order-preserving constraints are derived using KKT conditions, ensuring both computational tractability and numerical stability.

Learning with Possibilistic Supervision

A novel training scheme utilizes these projections:

  • For each instance NN3, compute NN4, then obtain NN5.
  • The model minimizes the KL divergence NN6, i.e., its predictions are adjusted minimally (in KL sense) to satisfy all possiblilistic-induced constraints.
  • Importantly, the projection step is treated as “off-the-graph” (i.e., stop-gradient), which avoids introducing non-differentiabilities into model optimization.

This approach is compared to using a fixed target (the antipignistic probability derived from NN7), and to directly minimizing KL to the empirical class proportions when available.

Empirical Evaluation

Synthetic data and a real-world natural language inference (NLI) task (derived from the ChaosNLI dataset) are used to validate the approach. On both settings:

  • The proposed KL-projection-based objective outperforms fixed-target and naive label smoothing baselines, especially in regimes with high ambiguity/imprecision and on ambiguous subsets of the data.
  • The Dykstra algorithm–based KL projection is demonstrably efficient and numerically stable for typical problem sizes (NN8 up to several hundreds), making this approach viable for practical use.
  • On the ChaosNLI task, the projection-based method achieves strong or best performance across a variety of train/validation/test settings (entire dataset, ambiguous-only, easy-only splits).

Strong Results and Contradictory Claims

A technically bold result is the proof that order-preserving constraints ensure not only compatibility but also structural correspondence between the possibility and probability orderings—a property not guaranteed by mere dominance constraints. This makes the framework strictly stronger than related KL-projection approaches that enforce only the former (e.g., [lienen2023conformal]). The results also contradict the widely applied convention that softmax normalization suffices to transform possibilistic targets: the authors formally show this can lead to violation of essential dominance constraints.

Theoretical and Practical Implications

On the theoretical side, this work provides a rigorous geometric and algorithmic foundation for learning with imprecise, set-based, and non-probabilistic supervision—broadening the operational interface between possibilistic and probabilistic reasoning. The methodology directly enables:

  • Fine-grained modeling of epistemic uncertainty in supervised learning, where labels are ambiguous, conflicting, or aggregated from heterogeneous sources.
  • Integration of additional qualitative constraints into learning (e.g., monotonicity, symmetry, domain knowledge).
  • Applications to credal/conformal learning, robust learning under label noise, label relaxation, and soft annotation aggregation.
  • Modularity (the intersection-based constraint set) allows for extensions to interval-valued, partial, or even alternative uncertainty calculi.

Practically, the framework provides an accessible route to leveraging ambiguous or crowd-sourced labels in neural classification while guaranteeing clear theoretical properties—robustness, shape-consistency, and constraint satisfaction.

Speculation on Future Directions

The projection-centric learning paradigm opens several avenues:

  • Application to large-scale crowdsourcing and human-in-the-loop annotation workflows, especially on subjective or fuzzy-domain datasets (e.g., FERPlus).
  • Extension to credal or more general imprecise-probability-based constraints, e.g., polyhedral credal sets, Dempster-Shafer masses, or interval probabilities.
  • Exploration of richer geometric constraints and their effect on generalization, calibration, and uncertainty quantification.
  • Tighter coupling with conformal prediction pipelines and probabilistically sound epistemic-uncertainty modeling.
  • Extension to structured output spaces, hierarchical classification, or multitask settings with complex label dependencies.

Conclusion

This work establishes a technically rigorous and computationally practical solution for probabilistic classification from possibilistic supervision by leveraging a KL-projection framework, grounded in principled convex-analytic and Bregman-projection methods. The approach achieves both strong theoretical guarantees and empirical improvements, especially under ambiguous and imprecise annotation regimes. Future developments will further exploit this geometric projection paradigm in broader AI and human-centered annotation contexts.


References:

For key algorithmic and theoretical underpinnings, see (2604.01939), as well as cited works within.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.