Reliability of Dataset-Based OOD Evaluation
Determine how reliably out-of-distribution (OOD) detection performance measured on individual benchmark datasets serves as an indicator of a model’s general ability to detect OOD examples across the broader space of plausible, untested inputs.
References
Second, it remains unclear how reliably the OOD detection performance on specific data sets can indicate the general ability to detect OOD examples, as a large portion of plausible OOD inputs remains untested.
— AP-OOD: Attention Pooling for Out-of-Distribution Detection
(2602.06031 - Hofmann et al., 5 Feb 2026) in Section: Limitations and Future Work