Robustness of distilled privacy sensitivity classifiers
Establish the robustness of encoder-based privacy sensitivity classifiers distilled from Mistral Large 3 by calibrating their 1–5 Likert-scale scores, evaluating and improving performance on out-of-domain inputs, and auditing domain- and demographic-dependent failure modes to ensure safe deployment in automated pipelines.
References
Finally, robustness remains open: calibrating scores, dealing with out-of-domain inputs, and auditing domain- and demographic-dependent failure modes are essential before deploying the model as part of automated pipelines.
— Distilling Human-Aligned Privacy Sensitivity Assessment from Large Language Models
(2603.29497 - Loiseau et al., 31 Mar 2026) in Section: Discussion — Future Work (final paragraph)