- The paper introduces a KL-projection approach that computes projections from model outputs onto a convex set defined by possibilistic constraints.
- It derives explicit analytical expressions for both dominance and order-preserving constraints using Dykstra’s algorithm with Bregman projections for computational efficiency.
- Empirical evaluations on synthetic and NLI tasks show improved performance under ambiguous, imprecise annotations compared to conventional methods.
Probabilistic Classification from Possibilistic Data via KL Projection: A Technical Synthesis
Motivation and Problem Setting
This paper addresses multi-class probabilistic classification under possibilistic supervision, where each data instance is annotated not with a categorical class label or probability vector, but with a normalized possibility distribution π over possible classes. Possibility theory, distinct from probability theory, encapsulates epistemic uncertainty (uncertainty arising from lack of knowledge) and induces dual possibility (Π) and necessity (N) measures. Practical situations motivating this paradigm include imprecise, incomplete, or crowd-aggregated annotations, where representing graded plausibility of classes is more meaningful than enforcing precise probability distributions.
The core technical challenge is twofold: (i) define a “feasible” convex set of probability distributions consistent with the provided possibility distribution, and (ii) train probabilistic classifiers (e.g., softmax neural networks) such that their output is compatible, in a precise sense, with this possibilistic supervision.
Constructing the Set of Feasible Probability Distributions
Given a normalized possibility distribution π over a finite support Y, the authors specify a closed convex set Fbox(π)⊂Δn (the probability simplex), comprising probability distributions p that must satisfy two classes of constraints:
- Dominance constraints (compatibility w.r.t. Π and N): For every subset A⊂Y, the induced probabilities satisfy Π0, where Π1.
- Order-preserving (shape) constraints: The ordering and equivalence of possibility levels in Π2 are preserved in Π3. That is, if Π4, then Π5; if Π6, then Π7.
These constraints are shown to be representable as a system of linear inequalities and difference bounds, exploiting structural properties of possibility/necessity measures, and can be constructed efficiently using the “antipignistic” transformation from possibility to probability distributions [dubois1983unfair, dubois1993possibility]. Importantly, the resulting feasible set is always nonempty (contains the antipignistic probability), convex, and closed. The intersection-of-convex-sets representation allows for modular extension with additional constraints, including varying types of qualitative and quantitative shape restrictions.
Kullback-Leibler Projection via Dykstra’s Algorithm
Model outputs (softmax vectors) are generally unconstrained with respect to Π8. The authors address this via KL projection: given a prediction Π9, compute the closest N0 under KL divergence,
N1
The paper demonstrates that the KL divergence is a Bregman distance (induced by negative entropy), and that the KL projection onto N2 can be performed using Dykstra’s iterative algorithm with explicit Bregman projections for each constraint set. Analytical expressions for the projections onto the dominance and order-preserving constraints are derived using KKT conditions, ensuring both computational tractability and numerical stability.
Learning with Possibilistic Supervision
A novel training scheme utilizes these projections:
- For each instance N3, compute N4, then obtain N5.
- The model minimizes the KL divergence N6, i.e., its predictions are adjusted minimally (in KL sense) to satisfy all possiblilistic-induced constraints.
- Importantly, the projection step is treated as “off-the-graph” (i.e., stop-gradient), which avoids introducing non-differentiabilities into model optimization.
This approach is compared to using a fixed target (the antipignistic probability derived from N7), and to directly minimizing KL to the empirical class proportions when available.
Empirical Evaluation
Synthetic data and a real-world natural language inference (NLI) task (derived from the ChaosNLI dataset) are used to validate the approach. On both settings:
- The proposed KL-projection-based objective outperforms fixed-target and naive label smoothing baselines, especially in regimes with high ambiguity/imprecision and on ambiguous subsets of the data.
- The Dykstra algorithm–based KL projection is demonstrably efficient and numerically stable for typical problem sizes (N8 up to several hundreds), making this approach viable for practical use.
- On the ChaosNLI task, the projection-based method achieves strong or best performance across a variety of train/validation/test settings (entire dataset, ambiguous-only, easy-only splits).
Strong Results and Contradictory Claims
A technically bold result is the proof that order-preserving constraints ensure not only compatibility but also structural correspondence between the possibility and probability orderings—a property not guaranteed by mere dominance constraints. This makes the framework strictly stronger than related KL-projection approaches that enforce only the former (e.g., [lienen2023conformal]). The results also contradict the widely applied convention that softmax normalization suffices to transform possibilistic targets: the authors formally show this can lead to violation of essential dominance constraints.
Theoretical and Practical Implications
On the theoretical side, this work provides a rigorous geometric and algorithmic foundation for learning with imprecise, set-based, and non-probabilistic supervision—broadening the operational interface between possibilistic and probabilistic reasoning. The methodology directly enables:
- Fine-grained modeling of epistemic uncertainty in supervised learning, where labels are ambiguous, conflicting, or aggregated from heterogeneous sources.
- Integration of additional qualitative constraints into learning (e.g., monotonicity, symmetry, domain knowledge).
- Applications to credal/conformal learning, robust learning under label noise, label relaxation, and soft annotation aggregation.
- Modularity (the intersection-based constraint set) allows for extensions to interval-valued, partial, or even alternative uncertainty calculi.
Practically, the framework provides an accessible route to leveraging ambiguous or crowd-sourced labels in neural classification while guaranteeing clear theoretical properties—robustness, shape-consistency, and constraint satisfaction.
Speculation on Future Directions
The projection-centric learning paradigm opens several avenues:
- Application to large-scale crowdsourcing and human-in-the-loop annotation workflows, especially on subjective or fuzzy-domain datasets (e.g., FERPlus).
- Extension to credal or more general imprecise-probability-based constraints, e.g., polyhedral credal sets, Dempster-Shafer masses, or interval probabilities.
- Exploration of richer geometric constraints and their effect on generalization, calibration, and uncertainty quantification.
- Tighter coupling with conformal prediction pipelines and probabilistically sound epistemic-uncertainty modeling.
- Extension to structured output spaces, hierarchical classification, or multitask settings with complex label dependencies.
Conclusion
This work establishes a technically rigorous and computationally practical solution for probabilistic classification from possibilistic supervision by leveraging a KL-projection framework, grounded in principled convex-analytic and Bregman-projection methods. The approach achieves both strong theoretical guarantees and empirical improvements, especially under ambiguous and imprecise annotation regimes. Future developments will further exploit this geometric projection paradigm in broader AI and human-centered annotation contexts.
References:
For key algorithmic and theoretical underpinnings, see (2604.01939), as well as cited works within.